PILCO: A Model-Based and Data-Efficient Approach to Policy Search
University of Washington · University of Cambridge
Abstract
In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, pilco can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-ofthe-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks. 1. Introduction and Related
Citation impact
- FWCI
- 37.86
- Percentile
- 100%
- References
- 25
Authors
2Topics & keywords
- Computer science
- Reinforcement learning
- Inference
- Probabilistic logic
- Key (lock)
- Artificial intelligence
- Machine learning
- Policy learning