|
|
|
|
Learning Representation and Control in Markov Decision Processes: New Frontiers
Foundations and Trends® in Machine Learning Volume 1 Issue 4 DOI: 10.1561/2200000003
Learning Representation and Control in Markov Decision Processes: New Frontiers
Sridhar Mahadevan
Department of Computer Science, University of Massachusetts -- Amherst, 140 Governor’s Drive, Amherst, MA 01003, USA,
mahadeva@cs.umass.edu
SUGGESTED CITATION:
Sridhar
Mahadevan
(2009)
"Learning Representation and Control in Markov Decision Processes: New Frontiers", Foundations and Trends® in Machine Learning: Vol. 1: No 4, pp 403-565.
http:/dx.doi.org/10.1561/2200000003
Abstract
This paper describes a novel machine learning framework for solving sequential decision problems called Markov decision processes
(MDPs) by iteratively computing low-dimensional representations and approximately optimal policies. A unified mathematical
framework for learning representation and optimal control in MDPs is presented based on a class of singular operators called
Laplacians, whose matrix representations have nonpositive off-diagonal elements and zero row sums. Exact solutions of discounted
and average-reward MDPs are expressed in terms of a generalized spectral inverse of the Laplacian called the Drazin inverse. A generic algorithm called representation policy iteration (RPI) is presented which interleaves computing low-dimensional representations and approximately optimal policies. Two approaches
for dimensionality reduction of MDPs are described based on geometric and reward-sensitive regularization, whereby low-dimensional
representations are formed by diagonalization or dilation of Laplacian operators. Model-based and model-free variants of the RPI algorithm are presented; they are also compared experimentally
on discrete and continuous MDPs. Some directions for future work are finally outlined.
|
|
|
|
|
|
|
|
|