Different notions of provenance for database queries have been proposed and studied in the past few years. In this article, we detail three main notions of database provenance, some of their applications, and compare and contrast amongst them. Specifically, we review why, how, and where provenance, describe the relationships among these notions of provenance, and describe some of their applications in confidence computation, view maintenance and update, debugging, and annotation propagation.
In September 2008, Google News promoted an undated article about United Airlines' near bankruptcy in 2002. In the ensuing panic, the share price of United Airlines dropped by around 75% in a few hours. This problem was due in part to the fact that the article lacked provenance that readers could have used to determine that it was out of date. In an increasingly networked world, understanding of provenance is essential for establishing trust in data stored in databases and exchanged among Web sites. It is also critical to the process of making key business, scientific, and governmental decisions. Modern database systems are capable of producing answers efficiently. However, they are generally lacking capabilities to explain provenance such as why and how the answers were produced, or where the data in the result came from. In recent years, different notions of provenance for database queries have been studied by the authors and a growing community of researchers in databases and scientific computation.
Provenance in Databases reviews research over the past ten years on why, how, and where provenance, clarifies the relationships among these notions of provenance, and describes some of their applications in confidence computation, view maintenance and update, debugging, and annotation propagation. Provenance in Databases is intended for engineers and researchers who would like to familiarize themselves with the foundations, as well as the many challenges in the field of database provenance.