The Nonstochastic Multiarmed Bandit Problem

Auer, Peter; Cesa‐Bianchi, Nicolò; Freund, Yoav; Schapire, Robert E.

doi:10.1137/s0097539701398375

articleSIAM Journal on ComputingJan 1, 2002Closed access

The Nonstochastic Multiarmed Bandit Problem

PAPeter Auer NCNicolò Cesa‐Bianchi YFYoav Freund RERobert E. Schapire

Graz University of Technology · University of Milan · +2 more institutions

Indexed incrossref

Abstract

In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an…

Citation impact

2,242

total citations

FWCI: 17.13
Percentile: 100%
References: 19

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Stochastic game
Minimax
Sequence (biology)
Multi-armed bandit
Mathematical optimization
Set (abstract data type)
Upper and lower bounds
Computer science

UN Sustainable Development Goals

Decent work and economic growth

No related works found for this paper.