preprintArXiv.orgApr 30, 2026GREEN OA

Bayesian policy gradient and actor-critic algorithms

Machine Science

Indexed inarxivdatacite

Abstract

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate the gradient, which tend to have high variance, requiring many samples and resulting in slow convergence. We first propose a Bayesian framework for policy gradient, based on modeling the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient and a measure of the uncertainty in the gradient estimates, namely, the gradient covariance, are provided at little extra cost. Since the proposed…

Citation impact

34
total citations
FWCI
Percentile
References
86
Citations per year

Authors

3

Topics & keywords

Keywords
  • Reinforcement learning
  • Covariance
  • Computer science
  • Gaussian process
  • Gradient method
  • Mathematical optimization
  • Markov decision process
  • Algorithm
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.