Foundations and Trends® in Machine Learning > Vol 12 > Issue 1-2

Introduction to Multi-Armed Bandits

Aleksandrs Slivkins, Microsoft Research NYC, USA, slivkins@microsoft.com
 
Suggested Citation
Aleksandrs Slivkins (2019), "Introduction to Multi-Armed Bandits", Foundations and TrendsĀ® in Machine Learning: Vol. 12: No. 1-2, pp 1-286. http://dx.doi.org/10.1561/2200000068

Published: 08 Nov 2019
© 2019 A. Slivkins
 
Subjects
 

Free Preview:

Article Help

Share

Download article
In this article:
Preface
Introduction: Scope and Motivation
1. Stochastic Bandits
2. Lower Bounds
3. Bayesian Bandits and Thompson Sampling
4. Lipschitz Bandits
5. Full Feedback and Adversarial Costs
6. Adversarial Bandits
7. Linear Costs and Semi-bandits
8. Contextual Bandits
9. Bandits and Games
10. Bandits with Knapsacks
11. Bandits and Incentives
Appendices
Acknowledgements
References

Abstract

Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous body of work has accumulated over the years, covered in several books and surveys. This book provides a more introductory, textbook-like treatment of the subject. Each chapter tackles a particular line of work, providing a self-contained, teachable technical introduction and a brief review of the further developments.

DOI:10.1561/2200000068
ISBN: 978-1-68083-620-2
306 pp. $99.00
Buy book
 
ISBN: 978-1-68083-621-9
306 pp. $280.00
Buy E-book (.pdf)
Table of contents:
Preface
Introduction: Scope and Motivation
1. Stochastic Bandits
2. Lower Bounds
3. Bayesian Bandits and Thompson Sampling
4. Lipschitz Bandits
5. Full Feedback and Adversarial Costs
6. Adversarial Bandits
7. Linear Costs and Semi-bandits
8. Contextual Bandits
9. Bandits and Games
10. Bandits with Knapsacks
11. Bandits and Incentives
Appendices
Acknowledgements
References

Introduction to Multi-Armed Bandits

This book gives a broad and accessible introduction to multi-armed bandits, a rich, multi-disciplinary area of increasing importance. The material is teachable by design: each chapter corresponds to one week of a course. There are no prerequisites other than a certain level of mathematical maturity, roughly corresponding to the basic undergraduate course on algorithms.

Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous, multi-dimensional body of work has accumulated over the years.

How to present this work, let alone make it teachable? The book partitions it into a dozen or so big directions. Each chapter handles one direction, covers the first-order concepts and results on a technical level, and provides a detailed literature review for further exploration. While most of the book is on learning theory, the last three chapters cover various connections to economics and operations research.

The book aims to convey that multi-armed bandits are both deeply theoretical and deeply practical. Apart from all the math, the book is careful about motivation, and discusses the practical aspects in considerable detail (based on the system for contextual bandits developed at Microsoft Research).

Lecturers can use this book for an introductory course on the subject. Such course would be complementary to graduate-level courses on online convex optimization and reinforcement learning.

 
MAL-068