Foundations and Trends® in Machine Learning > Vol 1 > Issue 4

Learning Representation and Control in Markov Decision Processes: New Frontiers

By Sridhar Mahadevan, Department of Computer Science, University of Massachusetts — Amherst, USA, mahadeva@cs.umass.edu

 
Suggested Citation
Sridhar Mahadevan (2009), "Learning Representation and Control in Markov Decision Processes: New Frontiers", Foundations and Trends® in Machine Learning: Vol. 1: No. 4, pp 403-565. http://dx.doi.org/10.1561/2200000003

Publication Date: 02 Jun 2009
© 2009 S. Mahadevan
 
Subjects
Markov chain Monte Carlo
 

Free Preview:

Download extract

Share

Download article
In this article:
1 Introduction 
2 Sequential Decision Problems 
3 Laplacian Operators and Markov Decision Processes 
4 Approximating Markov Decision Processes 
5 Dimensionality Reduction Principles in MDPs 
6 Basis Construction: Diagonalization Methods 
7 Basis Construction: Dilation Methods 
8 Model-Based Representation Policy Iteration 
9 Basis Construction in Continuous MDPs 
10 Model-Free Representation Policy Iteration 
11 Related Work and Future Challenges 
Acknowledgments 
References 

Abstract

This paper describes a novel machine learning framework for solving sequential decision problems called Markov decision processes (MDPs) by iteratively computing low-dimensional representations and approximately optimal policies. A unified mathematical framework for learning representation and optimal control in MDPs is presented based on a class of singular operators called Laplacians, whose matrix representations have nonpositive off-diagonal elements and zero row sums. Exact solutions of discounted and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the Drazin inverse. A generic algorithm called representation policy iteration (RPI) is presented which interleaves computing low-dimensional representations and approximately optimal policies. Two approaches for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional representations are formed by diagonalization or dilation of Laplacian operators. Model-based and model-free variants of the RPI algorithm are presented; they are also compared experimentally on discrete and continuous MDPs. Some directions for future work are finally outlined.

DOI:10.1561/2200000003
ISBN: 978-1-60198-238-4
176 pp. $110.00
Buy book (pb)
 
ISBN: 978-1-60198-239-1
176 pp. $150.00
Buy E-book (.pdf)
Table of contents:
1: Introduction
2: Sequential Decision Problems
3: Laplacian Operators and MDPs
4: Approximating Markov Decision Processes
5: Dimensionality Reduction Principles in MDPs
6: Basis Construction: Diagonalization Methods
7: Basis Construction: Dilation Methods
8: Model-Based Representation Policy Iteration
9: Basis Construction in Continuous MDPs
10: Model-Free Representation Policy Iteration
11: Related Work and Future Challenges
References

Learning Representation and Control in Markov Decision Processes

Learning Representation and Control in Markov Decision Processes describes methods for automatically compressing Markov decision processes (MDPs) by learning a low-dimensional linear approximation defined by an orthogonal set of basis functions. A unique feature of the text is the use of Laplacian operators, whose matrix representations have non-positive off-diagonal elements and zero row sums. The generalized inverses of Laplacian operators, in particular the Drazin inverse, are shown to be useful in the exact and approximate solution of MDPs. The author goes on to describe a broad framework for solving MDPs, generically referred to as representation policy iteration (RPI), where both the basis function representations for approximation of value functions as well as the optimal policy within their linear span are simultaneously learned. Basis functions are constructed by diagonalizing a Laplacian operator or by dilating the reward function or an initial set of bases by powers of the operator. The idea of decomposing an operator by finding its invariant subspaces is shown to be an important principle in constructing low-dimensional representations of MDPs. Theoretical properties of these approaches are discussed, and they are also compared experimentally on a variety of discrete and continuous MDPs. Finally, challenges for further research are briefly outlined. Learning Representation and Control in Markov Decision Processes is a timely exposition of a topic with broad interest within machine learning and beyond.

 
MAL-003