preprintArXiv.orgApr 21, 2026GREEN OA

Planning in entropy-regularized Markov decision processes and games

Google DeepMind (United Kingdom)

Indexed inarxivdatacite

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Citation impact

4
total citations
FWCI
Percentile
References
7
Citations per year

Authors

5

Topics & keywords

Keywords
  • Markov decision process
  • Mathematical optimization
  • Sample complexity
  • Entropy (arrow of time)
  • Computer science
  • Markov process
  • Markov chain
  • Regularization (linguistics)
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.

Funding