Bayesian policy gradient and actor-critic algorithms

Ghavamzadeh, Mohammad; Engel, Yaakov; Vaľko, Michal

doi:10.48550/arxiv.2604.27563

preprintArXiv.orgApr 30, 2026GREEN OA

Bayesian policy gradient and actor-critic algorithms

MGMohammad Ghavamzadeh YEYaakov Engel MVMichal Vaľko

Machine Science

Indexed inarxivdatacite

Abstract

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate the gradient, which tend to have high variance, requiring many samples and resulting in slow convergence. We first propose a Bayesian framework for policy gradient, based on modeling the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient and a measure of the uncertainty in the gradient estimates, namely, the gradient covariance, are provided at little extra cost. Since the proposed…

Citation impact

34

total citations

FWCI: —
Percentile: —
References: 86

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Reinforcement learning
Covariance
Computer science
Gaussian process
Gradient method
Mathematical optimization
Markov decision process
Algorithm

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.