Global Optimality Guarantees for Policy Gradient Methods
Indexed incrossref
Abstract
Policy gradient methods, which have powered a lot of recent success in reinforcement learning, search for an optimal policy in a parameterized policy class by performing stochastic gradient descent on the cumulative expected cost-to-go under some initial state distribution. Although widely used, these methods lack theoretical guarantees as the optimization objective is typically nonconvex even for simple control problems, and hence are understood to only converge to a stationary point. In “Global Optimality Guarantees for Policy Gradient Methods,” J. Bhandari and D. Russo identify structural properties of the underlying MDP that guarantee that despite nonconvexity, the optimization objective has no suboptimal…
Citation impact
113
total citations
- FWCI
- 17.42
- Percentile
- 100%
- References
- 123
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Mathematical optimization
- Stationary point
- Markov decision process
- Optimal control
- Gradient method
- Convergence (economics)
- Gradient descent
- Mathematics
UN Sustainable Development Goals
- Peace, Justice and strong institutions
No related works found for this paper.