Planning in entropy-regularized Markov decision processes and games

Grill, Jean-Bastien; Domingues, Omar Darwiche; Ménard, Pierre; Munos, Rémi; Vaľko, Michal

doi:10.48550/arxiv.2604.19695

preprintArXiv.orgApr 21, 2026GREEN OA

Planning in entropy-regularized Markov decision processes and games

JGJean-Bastien Grill ODOmar Darwiche Domingues PMPierre Ménard RMRémi Munos MVMichal Vaľko

Google DeepMind (United Kingdom)

Indexed inarxivdatacite

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Citation impact

4

total citations

FWCI: —
Percentile: —
References: 7

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Markov decision process
Mathematical optimization
Sample complexity
Entropy (arrow of time)
Computer science
Markov process
Markov chain
Regularization (linguistics)

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.