Foundations and Trends® in Robotics > Vol 2 > Issue 1–2

A Survey on Policy Search for Robotics

Marc Peter Deisenroth, Technische Universität Darmstadt, Germany, marc@ias.tu-darmstadt.de Gerhard Neumann, Technische Universität Darmstadt, Germany, neumann@ias.tu-darmstadt.de Jan Peters, Technische Universität Darmstadt, Germany, peters@ias.tu-darmstadt.de
 
Suggested Citation
Marc Peter Deisenroth, Gerhard Neumann and Jan Peters (2013), "A Survey on Policy Search for Robotics", Foundations and Trends® in Robotics: Vol. 2: No. 1–2, pp 1-142. http://dx.doi.org/10.1561/2300000021

Published: 30 Aug 2013
© 2013 M. P. Deisenroth, G. Neumann and J. Peters
 
Subjects
Artificial Intelligence in Robotics,  Planning and Control,  Markov Decision Processes
 

Free Preview:

Article Help

Share

Download article
In this article:
1. Introduction
2. Model-free Policy Search
3. Model-based Policy Search
4. Conclusion and Discussion
Acknowledgments
A. Gradients of Frequently Used Policies
B. Weighted ML Estimates of Frequently Used Policies
C. Derivations of the Dual Functions for REPS
References

Abstract

Policy search is a subfield in reinforcement learning which focuses on finding good parameters for a given policy parametrization. It is well suited for robotics as it can cope with high-dimensional state and action spaces, one of the main challenges in robot learning. We review recent successes of both model-free and model-based policy search in robot learning.

Model-free policy search is a general approach to learn policies based on sampled trajectories. We classify model-free methods based on their policy evaluation strategy, policy update strategy, and exploration strategy and present a unified view on existing algorithms. Learning a policy is often easier than learning an accurate forward model, and, hence, model-free methods are more frequently used in practice. However, for each sampled trajectory, it is necessary to interact with the robot, which can be time consuming and challenging in practice. Model-based policy search addresses this problem by first learning a simulator of the robot's dynamics from data. Subsequently, the simulator generates trajectories that are used for policy learning. For both model-free and model-based policy search methods, we review their respective properties and their applicability to robotic systems.

DOI:10.1561/2300000021
ISBN: 978-1-60198-702-0
146 pp. $99.00
Buy book
 
ISBN: 978-1-60198-703-7
146 pp. $230.00
Buy E-book
Table of contents:
1. Introduction
2. Model-free Policy Search
3. Model-based Policy Search
4. Conclusion and Discussion
Acknowledgments
A. Gradients of Frequently Used Policies
B. Weighted ML Estimates of Frequently Used Policies
C. Derivations of the Dual Functions for REPS
References

A Survey on Policy Search for Robotics

Policy search is a subfield of Reinforcement Learning (RL) that focuses on finding good parameters for a given policy parameterization. It is well suited tor robotics as it can cope with high-dimensional state and action spaces, which is one of the main challenges in robot learning.

A Survey on Policy Search for Robotics reviews recent successes of both model-free and model-based policy search in robot learning. Model-free policy search is a general approach to learn policies based on sampled trajectories. This text classifies model-free methods based on their policy evaluation, policy update, and exploration strategies, and presents a unified view of existing algorithms. Learning a policy is often easier than learning an accurate forward model, and, hence, model-free methods are more frequently used in practice. However, for each sampled trajectory, it is necessary to interact with the robot, which can be time consuming and challenging in practice. Model-based policy search addresses this problem by first learning a simulator of the robot's dynamics from data. Subsequently, the simulator generates trajectories that are used for policy learning.

For both model-free and model-based policy search methods, A Survey on Policy Search for Robotics reviews their respective properties and their applicability to robotic systems. It is an invaluable reference for anyone working in the area.

 
ROB-021

Erratum for 3.3.2.2 Analytic policy gradients (available for journal subscribers)

Erratum | 2300000021_Erratum.pdf

Submitted By: Marc Deisenroth, Technische Universität Darmstadt, marc@ias.tu-darmstadt.de. Date Accepted: 9/23/2013

  • Description: The authors have corrected Equation (3.21) and the sentence that follows on p. 103 in this issue. If you have access to this journal article or e-book, please see the 'Erratum' link above for the correction.