Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Indexed incrossref
Abstract
A multi-armed bandit problem - or, simply, a bandit problem - is a sequential allocation problem defined by a set of actions. At each time step, a unit resource is allocated to an action and some observable payoff is obtained. The goal is to maximize the total payoff obtained in a sequence of allocations. The name bandit refers to the colloquial term for a slot machine (a "one-armed bandit" in American slang). In a casino, a sequential allocation problem is obtained when the player is facing many slot machines at once (a "multi-armed bandit"), and must repeatedly choose where to insert the next coin. Multi-armed bandit problems are the most basic examples of sequential decision problems with an…
Citation impact
1,541
total citations
- FWCI
- 27.04
- Percentile
- 100%
- References
- 192
Citations per year
Authors
1Topics & keywords
Topics
Keywords
- Multi-armed bandit
- Stochastic game
- Regret
- Computer science
- Mathematical optimization
- Simple (philosophy)
- Mathematical economics
- Operations research
No related works found for this paper.