articleMar 1, 2003Closed access

Using confidence bounds for exploitation-exploration trade-offs

Graz University of Technology

Abstract

We show how a standard tool from statistics — namely confidence bounds — can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a random process. We apply our technique to two models with such an exploitation-exploration trade-off. For the adversarial bandit problem with shifting our new algorithm suffers only Õ � (ST) 1/2� regret with high probability over T trials with S shifts. Such a regret bound was previously known only in expectation. The second…

Citation impact

1,255
total citations
FWCI
3.10
Percentile
100%
References
17
Citations per year

Authors

1

Topics & keywords

Keywords
  • Regret
  • Reinforcement learning
  • Computer science
  • Value (mathematics)
  • Process (computing)
  • Mathematical optimization
  • Adversarial system
  • Artificial intelligence
No related works found for this paper.