Near-optimal Regret Bounds for Reinforcement Learning

Jaksch, Thomas; Ortner, Ronald; Auer, Peter

articleMar 1, 2010Closed access

Near-optimal Regret Bounds for Reinforcement Learning

TJThomas Jaksch RORonald Ortner PAPeter Auer

Abstract

For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s,s ′ there is a policy which moves from s to s ′ in at most D steps (on average). We present a reinforcement learning algorithm with total regret Õ(DS √ AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D. A corresponding lower bound of Ω ( √ DSAT) on the total regret of any learning algorithm is given as well. These results are complemented by a sample complexity bound on the…

Citation impact

711

total citations

FWCI: 39.10
Percentile: 100%
References: 26

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Regret
Markov decision process
Reinforcement learning
Logarithm
Upper and lower bounds
Mathematical optimization
Sample complexity
Mathematics

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.