Abstract

In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, pilco can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-ofthe-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks. 1. Introduction and Related

Citation impact

1,078
total citations
FWCI
37.86
Percentile
100%
References
25
Citations per year

Authors

2

Topics & keywords

Keywords
  • Computer science
  • Reinforcement learning
  • Inference
  • Probabilistic logic
  • Key (lock)
  • Artificial intelligence
  • Machine learning
  • Policy learning
No related works found for this paper.