Foundations and Trends® in Databases > Vol 9 > Issue 3-4

Data Provenance

By Boris Glavic, Illinois Institute of Technology, USA,

Suggested Citation
Boris Glavic (2021), "Data Provenance", Foundations and Trends® in Databases: Vol. 9: No. 3-4, pp 209-441.

Publication Date: 28 Apr 2021
© 2021 Boris Glavic
Metadata management,  Trust and provenance

Free Preview:

Download extract


Download article
In this article:
1. Introduction
2. Provenance Models - Formalizing Provenance Semantics
3. Applications
4. Provenance Capture, Storage, and Querying
5. Connection to Other Research Fields
6. Summary and Conclusions


Data provenance has evolved from a niche topic to a mainstream area of research in databases and other research communities. This article gives a comprehensive introduction to data provenance. The main focus is on provenance in the context of databases. However, it will be insightful to also consider connections to related research in programming languages, software engineering, semantic web, formal logic, and other communities. The target audience are researchers and practitioners that want to gain a solid understanding of data provenance and the state-of-the-art in this research area. The article only assumes that the reader has a basic understanding of database concepts, but not necessarily any prior exposure to provenance.

ISBN: 978-1-68083-828-2
246 pp. $99.00
Buy book (pb)
ISBN: 978-1-68083-829-9
246 pp. $280.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Background
3. FPGAs in Datacenters and Clouds
4. Designing Accelerators with FPGAs
5. Future Challenges and Architectures
6. Closing Remarks

Data Provenance: Origins, Applications, Algorithms, and Models

The term provenance is used in the art world to describe a record of the history of ownership of a piece of art. This term has been adapted by the database community to describe a record of the origin of a piece of data. Data provenance emerged as a research topic in the database community in the late 1990s. Data provenance, by explaining how the result of an operation was derived from its inputs, has proven to be a useful tool that is applicable in a wide variety of applications.

This monograph gives a comprehensive introduction to data provenance concepts, algorithms, and methodology developed in the last few decades. It introduces the reader to the formalisms, algorithms, and system’s developments in this fascinating field as well as providing a collection of relevant literature references for further research. The monograph provides a concise starting point for research into and using provenance in data. Although focusing on data provenance in databases pointers to work in other fields are given throughout.

The intended audience is researchers and practitioners unfamiliar with the topic who want to develop a basic understanding of provenance techniques and the state-of-the-art in the field as well as researchers with prior experience in provenance that want to broaden their horizon.