Planning in entropy-regularized Markov decision processes and games
Google DeepMind (United Kingdom)
Indexed inarxivdatacite
Abstract
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
Citation impact
4
total citations
- FWCI
- —
- Percentile
- —
- References
- 7
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Markov decision process
- Mathematical optimization
- Sample complexity
- Entropy (arrow of time)
- Computer science
- Markov process
- Markov chain
- Regularization (linguistics)
UN Sustainable Development Goals
- Peace, Justice and strong institutions
No related works found for this paper.