Using confidence bounds for exploitation-exploration trade-offs

Auer, Peter

articleMar 1, 2003Closed access

Using confidence bounds for exploitation-exploration trade-offs

Abstract

We show how a standard tool from statistics — namely confidence bounds — can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a random process. We apply our technique to two models with such an exploitation-exploration trade-off. For the adversarial bandit problem with shifting our new algorithm suffers only Õ � (ST) 1/2� regret with high probability over T trials with S shifts. Such a regret bound was previously known only in expectation. The second…

Citation impact

1,255

total citations

FWCI: 3.10
Percentile: 100%
References: 17

Citations per year

Authors

1

PA
Peter AuerCorresponding
Graz University of Technology

Topics & keywords

Topics

Keywords

Regret
Reinforcement learning
Computer science
Value (mathematics)
Process (computing)
Mathematical optimization
Adversarial system
Artificial intelligence

No related works found for this paper.