By Joseph P. Near, University of Vermont, jnear@uvm.edu | Chiké Abuah, University of Vermont, abuahchu@gmail.com
This monograph provides a comprehensive, example-driven introduction to differential privacy through the lens of practical programming. Bridging the gap between theory and implementation, it walks readers through core mechanisms, such as the Laplace and Exponential mechanisms, smooth sensitivity, sample-and-aggregate, and the sparse vector technique, all while grounding each in executable Python examples. With an emphasis on hands-on learning and reproducibility, the material is designed for learners who wish to move beyond abstract definitions and understand how differential privacy is applied to real datasets, models, and systems.
The sections of this monograph are organized around both foundational theory and practical concerns: sensitivity and composition are introduced early, followed by in-depth treatments of de-identification, synthetic data generation, and private machine learning. The monograph also includes discussion of subtle implementation challenges such as efficiency, numerical stability, privacy accounting, and optimizations like ghost clipping. Wherever possible, examples use familiar tools like pandas, numpy, and matplotlib to increase clarity with data scientists.
By making privacy-preserving algorithms concrete and programmable, the monograph aims to lower the barrier to entry for researchers, engineers, and educators working with sensitive data. It is intended as both a self-contained reference and a foundation for further exploration into formal privacy and responsible AI.
This is a monograph about differential privacy, for programmers. It is intended to give readers an introduction to the challenges of data privacy, introduce them to the techniques that have been developed for addressing those challenges, and help people to understand how to implement some of those techniques. The monograph contains numerous examples as programs, including implementations of many concepts. It assumes a working knowledge of Python, as well as basic knowledge of the pandas and NumPy libraries. Readers will also benefit from some background in discrete mathematics and probability.
This monograph is primarily focused on differential privacy. The first couple of sections outline some of the reasons why differential privacy (and its variants) is the only formal approach we know about that seems to provide robust privacy protection. Commonly-used approaches that have been used for decades (like de-identification and aggregation) have more recently been shown to break down under sophisticated privacy attacks, and even more modern techniques (like -Anonymity) are susceptible to certain attacks. For this reason, differential privacy is fast becoming the gold standard in privacy protection, and thus it is the primary focus of this work.