articleOperations ResearchJan 5, 2024Closed access

Global Optimality Guarantees for Policy Gradient Methods

Columbia University

Indexed incrossref

Abstract

Policy gradient methods, which have powered a lot of recent success in reinforcement learning, search for an optimal policy in a parameterized policy class by performing stochastic gradient descent on the cumulative expected cost-to-go under some initial state distribution. Although widely used, these methods lack theoretical guarantees as the optimization objective is typically nonconvex even for simple control problems, and hence are understood to only converge to a stationary point. In “Global Optimality Guarantees for Policy Gradient Methods,” J. Bhandari and D. Russo identify structural properties of the underlying MDP that guarantee that despite nonconvexity, the optimization objective has no suboptimal…

Citation impact

113
total citations
FWCI
17.42
Percentile
100%
References
123
Citations per year

Authors

2

Topics & keywords

Keywords
  • Mathematical optimization
  • Stationary point
  • Markov decision process
  • Optimal control
  • Gradient method
  • Convergence (economics)
  • Gradient descent
  • Mathematics
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.