Foundations and Trends® in Theoretical Computer Science > Vol 1 > Issue 2

Data Streams: Algorithms and Applications

By S. Muthukrishnan, Rutgers University, USA, muthu@cs.rutgers.edu

 
Suggested Citation
S. Muthukrishnan (2005), "Data Streams: Algorithms and Applications", Foundations and TrendsĀ® in Theoretical Computer Science: Vol. 1: No. 2, pp 117-236. http://dx.doi.org/10.1561/0400000002

Publication Date: 27 Sep 2005
© 2005 S. Muthukrishnan
 
Subjects
Database theory
 

Free Preview:

Download extract

Share

Download article
In this article:
1 Introduction 
2 Map 
3 The Data Stream Phenomenon 
4 Data Streaming: Formal Aspects 
5 Foundations: Basic Mathematical Ideas 
6 Foundations: Basic Algorithmic Techniques 
7 Foundations: Summary 
8 Streaming Systems 
9 New Directions 
10 Historic Notes 
11 Concluding Remarks 
Acknowledgements 
References 

Abstract

In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time, and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated version of [175].

DOI:10.1561/0400000002
ISBN: 978-1-933019-14-7
135 pp. $60.00
Buy book (pb)
 
ISBN: 978-1-933019-60-4
135 pp. $100.00
Buy E-book (.pdf)
Table of contents:
1 Introduction
2 Map
3 The Data Stream Phenomenon
4 Data Streaming: Formal Aspects
5 Foundations: Basic Mathematical Ideas
6 Foundations: Basic Algorithmic Techniques
7 Foundations: Summary
8 Streaming Systems
9 New Directions
10 Historic Notes
11 Concluding Remarks

Data Streams

Data stream algorithms as an active research agenda emerged only over the past few years, even though the concept of making few passes over the data for performing computations has been around since the early days of Automata Theory. The data stream agenda now pervades many branches of Computer Science including databases, networking, knowledge discovery and data mining, and hardware systems. Industry is in synch too, with Data Stream Management Systems (DSMSs) and special hardware to deal with data speeds. Even beyond Computer Science, data stream concerns are emerging in physics, atmospheric science and statistics. Data Streams: Algorithms and Applications focuses on the algorithmic foundations of data streaming. In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Data Streams: Algorithms and Applications surveys the emerging area of algorithms for processing data streams and associated applications. An extensive bibliography with over 200 entries points the reader to further resources for exploration.

 
TCS-002