Foundations and Trends® in Statistics > Vol 1 > Issue 3-4

A Modern Theory of Cross-validation Through the Lens of Stability

By Jing Lei, Carnegie Mellon University, USA, jinglei@andrew.cmu.edu

 
Suggested Citation
Jing Lei (2025), "A Modern Theory of Cross-validation Through the Lens of Stability", Foundations and Trends® in Statistics: Vol. 1: No. 3-4, pp 391-548. http://dx.doi.org/10.1561/3600000005

Publication Date: 01 Dec 2025
© 2025 J. Lei
 
Subjects
Classification and prediction,  Evaluation,  Model choice,  Nonparametric methods,  Statistical learning theory,  Online learning,  Stochastic optimization,  Information theory and statistics
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. Risk Consistency of CV by Stability
3. Regression Model Selection Under Squared Loss
4. Central Limit Theorems For Cross-validation
5. Some Applications
6. Stability of Estimators
7. Miscellaneous
Acknowledgements
References

Abstract

Modern data analysis and statistical learning are characterized by two defining features: complex data structures and black-box algorithms. The complexity of data structures arises from advanced data collection technologies and data-sharing infrastructures, such as imaging, remote sensing, wearable devices, and genomic sequencing. In parallel, black-box algorithms—particularly those stemming from advances in deep neural networks—have demonstrated remarkable success on modern datasets. This confluence of complex data and opaque models introduces new challenges for uncertainty quantification and statistical inference, a problem we refer to as “black-box inference”.

The difficulty of black-box inference lies in the absence of traditional parametric or nonparametric modeling assumptions, as well as the intractability of the algorithmic behavior underlying many modern estimators. These factors make it difficult to precisely characterize the sampling distribution of estimation errors. A common approach to address this issue is post-hoc randomization, which includes permutation, resampling, sample splitting, cross-validation, and noise injection. When combined with mild assumptions, such as exchangeability in the data-generating process, these methods can yield valid inference and uncertainty quantification.

Post-hoc randomization methods have a rich history, ranging from classical techniques like permutation tests, the jackknife, and the bootstrap, to more recent developments such as conformal inference. These approaches typically require minimal knowledge about the underlying data distribution or the inner workings of the estimation procedure. While originally designed for varied purposes, many of these techniques rely, either implicitly or explicitly, on the assumption that the estimation procedure behaves similarly under small perturbations to the data. This idea, now formalized under the concept of stability, has become a foundational principle in modern data science. Over the past few decades, stability has emerged as a central research focus in both statistics and machine learning, playing critical roles in areas such as generalization error, data privacy, and adaptive inference.

In this article, we investigate one of the most widely used resampling techniques for model comparison and evaluation– cross-validation (CV)–through the lens of stability. We begin by reviewing recent theoretical developments in CV for generalization error estimation and model selection under stability assumptions. We then explore more refined results concerning uncertainty quantification for CV-based risk estimates. By integrating these research directions, we uncover new theoretical insights and methodological tools. Finally, we illustrate their utility across both classical and contemporary topics, including model selection, selective inference, and conformal prediction.

DOI:10.1561/3600000005
ISBN: 978-1-63828-663-9
172 pp. $320.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Risk Consistency of CV by Stability
3. Regression Model Selection Under Squared Loss
4. Central Limit Theorems For Cross-validation
5. Some Applications
6. Stability of Estimators
7. Miscellaneous
Acknowledgements
References

A Modern Theory of Cross-validation Through the Lens of Stability

Modern data analysis and statistical learning are characterized by two defining features: complex data structures and black-box algorithms. The complexity of data structures arises from advanced data collection technologies and data-sharing infrastructures, such as imaging, remote sensing, wearable devices, and genomic sequencing. In parallel, black-box algorithms—particularly those stemming from advances in deep neural networks—have demonstrated remarkable success on modern datasets. This confluence of complex data and opaque models introduces new challenges for uncertainty quantification and statistical inference, a problem we refer to as “black-box inference”. The difficulty of black-box inference lies in the absence of traditional parametric or nonparametric modeling assumptions, as well as the intractability of the algorithmic behavior underlying many modern estimators. These factors make it difficult to precisely characterize the sampling distribution of estimation errors. A common approach to address this issue is post-hoc randomization, which includes permutation, resampling, sample splitting, cross-validation, and noise injection. When combined with mild assumptions, such as exchangeability in the data-generating process, these methods can yield valid inference and uncertainty quantification. Post-hoc randomization methods have a rich history, ranging from classical techniques like permutation tests, the jackknife, and the bootstrap, to more recent developments such as conformal inference. These approaches typically require minimal knowledge about the underlying data distribution or the inner workings of the estimation procedure. While originally designed for varied purposes, many of these techniques rely, either implicitly or explicitly, on the assumption that the estimation procedure behaves similarly under small perturbations to the data. This idea, now formalized under the concept of stability, has become a foundational principle in modern data science. Over the past few decades, stability has emerged as a central research focus in both statistics and machine learning, playing critical roles in areas such as generalization error, data privacy, and adaptive inference.

In this monograph, one of the most widely used resampling techniques for model comparison and evaluation—cross-validation (CV)—through the lens of stability is investigated. Firstly, recent theoretical developments in CV for generalization error estimation and model selection under stability assumptions are reviewed. Thereafter, more refined results concerning uncertainty quantification for CV-based risk estimates are explored, and by integrating these research directions, new theoretical insights and methodological tools are uncovered. Finally, utility across both classical and contemporary topics are illustrated, including model selection, selective inference, and conformal prediction.

 
STA-005