Foundations and Trends® in Databases > Vol 5 > Issue 3

The Design and Implementation of Modern Column-Oriented Database Systems

By Daniel Abadi, Yale University, USA, dna@cs.yale.edu | Peter Boncz, CWI, The Netherlands, P.Boncz@cwi.nl | Stavros Harizopoulos, Amiato, Inc., USA, stavros@amiato.com | Stratos Idreos, Harvard University, USA, stratos@seas.harvard.edu | Samuel Madden, MIT Computer Science and Artificial Intelligence Laboratory, USA, madden@csail.mit.edu

 
Suggested Citation
Daniel Abadi, Peter Boncz, Stavros Harizopoulos, Stratos Idreos and Samuel Madden (2013), "The Design and Implementation of Modern Column-Oriented Database Systems", Foundations and TrendsĀ® in Databases: Vol. 5: No. 3, pp 197-280. http://dx.doi.org/10.1561/1900000024

Publication Date: 04 Dec 2013
© 2013 D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, and S. Madden
 
Subjects
Storage, Access Methods, and Indexing,  Query Processing and Optimization,  Database Design and Tuning
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction 
2. History, Trends, and Performance Tradeoffs 
3. Column-store Architectures 
4. Column-store Internals and Advanced Techniques 
5. Discussion, Conclusions, and Future Directions 
References 

Abstract

In this article, we survey recent research on column-oriented database systems, or column-stores, where each attribute of a table is stored in a separate file or region on storage. Such databases have seen a resurgence in recent years with a rise in interest in analytic queries that perform scans and aggregates over large portions of a few columns of a table. The main advantage of a column-store is that it can access just the columns needed to answer such queries. We specifically focus on three influential research prototypes, MonetDB [46], MonetDB/X100 [18], and C-Store [86]. These systems have formed the basis for several well-known commercial column-store implementations. We describe their similarities and differences and discuss their specific architectural features for compression, late materialization, join processing, vectorization and adaptive indexing (database cracking).

DOI:10.1561/1900000024
ISBN: 978-1-60198-754-9
100 pp. $75.00
Buy book (pb)
 
ISBN: 978-1-60198-755-6
100 pp. $115.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. History, Trends, and Performance Tradeoffs
3. Column-store Architectures
4. Column-store Internals and Advanced Techniques
5. Discussion, Conclusions, and Future Directions
References

The Design and Implementation of Modern Column-Oriented Database Systems

Database system performance is directly related to the efficiency of the system at storing data on primary storage (for example, disk) and moving it into CPU registers for processing. For this reason, there is a long history in the database community of research exploring physical storage alternatives, including sophisticated indexing, materialized views, and vertical and horizontal partitioning. In recent years, there has been renewed interest in so-called column-oriented systems, sometimes also called column-stores. Column-store systems completely vertically partition a database into a collection of individual columns that are stored separately. By storing each column separately on disk, these column-based systems enable queries to readjust the attributes they need, rather than having to read entire rows from disk and discard unneeded attributes once they are in memory.

The Design and Implementation of Modern Column-Oriented Database Systems discusses modern column-stores, their architecture and evolution as well the benefits they can bring in data analytics. There is a specific focus on three influential research prototypes, MonetDB, MonetDB/X100, and C-Store. These systems have formed the basis for several well-known commercial column-store implementations. Their similarities and differences are described and they are discussed in terms of their specific architectural features for compression, late materialization, join processing, vectorization and adaptive indexing (database cracking).

The Design and Implementation of Modern Column-Oriented Database Systems is an excellent reference on the topic for database researchers and practitioners.

 
DBS-024