Global Optimality Guarantees for Policy Gradient Methods

Bhandari, Jalaj; Russo, Daniel

doi:10.1287/opre.2021.0014

articleOperations ResearchJan 5, 2024Closed access

Global Optimality Guarantees for Policy Gradient Methods

JBJalaj Bhandari DRDaniel Russo

Columbia University

Indexed incrossref

Abstract

Policy gradient methods, which have powered a lot of recent success in reinforcement learning, search for an optimal policy in a parameterized policy class by performing stochastic gradient descent on the cumulative expected cost-to-go under some initial state distribution. Although widely used, these methods lack theoretical guarantees as the optimization objective is typically nonconvex even for simple control problems, and hence are understood to only converge to a stationary point. In “Global Optimality Guarantees for Policy Gradient Methods,” J. Bhandari and D. Russo identify structural properties of the underlying MDP that guarantee that despite nonconvexity, the optimization objective has no suboptimal…

Citation impact

113

total citations

FWCI: 17.42
Percentile: 100%
References: 123

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Mathematical optimization
Stationary point
Markov decision process
Optimal control
Gradient method
Convergence (economics)
Gradient descent
Mathematics

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.