articleMar 1, 2003Closed access
Using confidence bounds for exploitation-exploration trade-offs
Abstract
We show how a standard tool from statistics — namely confidence bounds — can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a random process. We apply our technique to two models with such an exploitation-exploration trade-off. For the adversarial bandit problem with shifting our new algorithm suffers only Õ � (ST) 1/2� regret with high probability over T trials with S shifts. Such a regret bound was previously known only in expectation. The second…
Citation impact
1,255
total citations
- FWCI
- 3.10
- Percentile
- 100%
- References
- 17
Citations per year
Authors
1Topics & keywords
Topics
Keywords
- Regret
- Reinforcement learning
- Computer science
- Value (mathematics)
- Process (computing)
- Mathematical optimization
- Adversarial system
- Artificial intelligence
No related works found for this paper.