Foundations and Trends® in Databases > Vol 5 > Issue 1

Massively Parallel Databases and MapReduce Systems

By Shivnath Babu, Duke University, USA, shivnath@cs.duke.edu | Herodotos Herodotou, Microsoft Research, USA, herohero@microsoft.com

 
Suggested Citation
Shivnath Babu and Herodotos Herodotou (2013), "Massively Parallel Databases and MapReduce Systems", Foundations and TrendsĀ® in Databases: Vol. 5: No. 1, pp 1-104. http://dx.doi.org/10.1561/1900000036

Publication Date: 20 Nov 2013
© 2013 S. Babu and H. Herodotou
 
Subjects
Parallel and Distributed Database Systems,  Data Mining and OLAP,  Distributed computing
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction 
2. Classic Parallel Database Systems 
3. Columnar Database Systems 
4. MapReduce Systems 
5. Dataflow Systems 
6. Conclusions 
References 

Abstract

Timely and cost-effective analytics over "big data" has emerged as a key ingredient for success in many businesses, scientific and engineering disciplines, and government endeavors. Web clicks, social media, scientific experiments, and datacenter monitoring are among data sources that generate vast amounts of raw data every day. The need to convert this raw data into useful information has spawned considerable innovation in systems for large-scale data analytics, especially over the last decade. This monograph covers the design principles and core features of systems for analyzing very large datasets using massively-parallel computation and storage techniques on large clusters of nodes. We first discuss how the requirements of data analytics have evolved since the early work on parallel database systems. We then describe some of the major technological innovations that have each spawned a distinct category of systems for data analytics. Each unique system category is described along a number of dimensions including data model and query interface, storage layer, execution engine, query optimization, scheduling, resource management, and fault tolerance. We conclude with a summary of present trends in large-scale data analytics.

DOI:10.1561/1900000036
ISBN: 978-1-60198-750-1
120 pp. $85.00
Buy book (pb)
 
ISBN: 978-1-60198-751-8
120 pp. $115.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Classic Parallel Database Systems
3. Columnar Database Systems
4. MapReduce Systems
5. Dataflow Systems
6. Conclusions
References

Massively Parallel Databases and MapReduce Systems

Timely and cost-effective analytics over "big data" has emerged as a key ingredient for success in many businesses, scientific and engineering disciplines, and government endeavors. Web clicks, social media, scientific experiments, and datacenter monitoring are among data sources that generate vast amounts of raw data every day. The need to convert this raw data into useful information has spawned considerable innovation in systems for large-scale data analytics, especially over the last decade.

Massively Parallel Databases and MapReduce Systems addresses the design principles and core features of systems for analyzing very large datasets using massively-parallel computation and storage techniques on large clusters of nodes. It first discusses how the requirements of data analytics have evolved since the early work on parallel database systems. It then describes some of the major technological innovations that have each spawned a distinct category of systems for data analytics. Each unique system category is described along a number of dimensions including data model and query interface, storage layer, execution engine, query optimization, scheduling, resource management, and fault tolerance. It concludes with a summary of present trends in large-scale data analytics.

Massively Parallel Databases and MapReduce Systems is an ideal reference for anyone with a research or professional interest in large-scale data analytics.

 
DBS-036