Bayesian policy gradient and actor-critic algorithms
Indexed inarxivdatacite
Abstract
Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate the gradient, which tend to have high variance, requiring many samples and resulting in slow convergence. We first propose a Bayesian framework for policy gradient, based on modeling the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient and a measure of the uncertainty in the gradient estimates, namely, the gradient covariance, are provided at little extra cost. Since the proposed…
Citation impact
34
total citations
- FWCI
- —
- Percentile
- —
- References
- 86
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Reinforcement learning
- Covariance
- Computer science
- Gaussian process
- Gradient method
- Mathematical optimization
- Markov decision process
- Algorithm
UN Sustainable Development Goals
- Peace, Justice and strong institutions
No related works found for this paper.