Foundations and Trends® in Machine Learning >
Vol 7 > Issue 2-3

Steve Hanneke (2014), "Theory of Disagreement-Based Active Learning", Foundations and TrendsĀ® in Machine Learning: Vol. 7: No. 2-3, pp 131-309. http://dx.doi.org/10.1561/2200000037

© 2014 S. Hanneke

Download article
**In this article:**

1. Introduction

2. Basic Definitions and Notation

3. A Brief Review of Passive Learning

4. Lower Bounds on the Label Complexity

5. Disagreement-Based Active Learning

6. Computational Efficiency via Surrogate Losses

7. Bounding the Disagreement Coefficient

8. A Survey of Other Topics and Techniques

References

Active learning is a protocol for supervised machine learning, in which a learning algorithm sequentially requests the labels of selected data points from a large pool of unlabeled data. This contrasts with passive learning, where the labeled data are taken at random. The objective in active learning is to produce a highly-accurate classifier, ideally using fewer labels than the number of random labeled data sufficient for passive learning to achieve the same. This article describes recent advances in our understanding of the theoretical benefits of active learning, and implications for the design of effective active learning algorithms. Much of the article focuses on a particular technique, namely disagreement-based active learning, which by now has amassed a mature and coherent literature. It also briefly surveys several alternative approaches from the literature. The emphasis is on theorems regarding the performance of a few general algorithms, including rigorous proofs where appropriate. However, the presentation is intended to be pedagogical, focusing on results that illustrate fundamental ideas, rather than obtaining the strongest or most general known theorems. The intended audience includes researchers and advanced graduate students in machine learning and statistics, interested in gaining a deeper understanding of the recent and ongoing developments in the theory of active learning.

1. Introduction

2. Basic Definitions and Notation

3. A Brief Review of Passive Learning

4. Lower Bounds on the Label Complexity

5. Disagreement-Based Active Learning

6. Computational Efficiency via Surrogate Losses

7. Bounding the Disagreement Coefficient

8. A Survey of Other Topics and Techniques

References

Active learning is a protocol for supervised machine learning in which a learning algorithm sequentially requests the labels of selected data points from a large pool of unlabeled data. This contrasts with passive learning where the labeled data are taken at random. The objective in active learning is to produce a highly-accurate classifier, ideally using fewer labels than the number of random labeled data sufficient for passive learning to achieve the same.

*Theory of Disagreement-Based Active Learning* describes recent advances in our understanding of the theoretical benefits of active learning,
and implications for the design of effective active learning algorithms. Much of the monograph focuses on a particular technique, namely
disagreement-based active learning, which by now has amassed a mature and coherent literature. It also briefly surveys several alternative
approaches from the literature. The emphasis is on theorems regarding the performance of a few general algorithms, including rigorous proofs
where appropriate. However, the presentation is intended to be pedagogical, focusing on results that illustrate fundamental ideas rather than
obtaining the strongest or most generally known theorems.

*Theory of Disagreement-Based Active Learning* is intended for researchers and advanced graduate students in machine
learning and statistics who are interested in gaining a deeper understanding of the recent and ongoing developments in the theory of active
learning.