Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Bubeck, Sébastien

doi:10.1561/9781601986276

booknow publishers, Inc. eBooksJan 1, 2012Closed access

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

SBSébastien Bubeck

Princeton University

Indexed incrossref

Abstract

A multi-armed bandit problem - or, simply, a bandit problem - is a sequential allocation problem defined by a set of actions. At each time step, a unit resource is allocated to an action and some observable payoff is obtained. The goal is to maximize the total payoff obtained in a sequence of allocations. The name bandit refers to the colloquial term for a slot machine (a "one-armed bandit" in American slang). In a casino, a sequential allocation problem is obtained when the player is facing many slot machines at once (a "multi-armed bandit"), and must repeatedly choose where to insert the next coin. Multi-armed bandit problems are the most basic examples of sequential decision problems with an…

Citation impact

1,541

total citations

FWCI: 27.04
Percentile: 100%
References: 192

Citations per year

Authors

1

SB
Sébastien BubeckCorresponding
Princeton University

Topics & keywords

Topics

Keywords

Multi-armed bandit
Stochastic game
Regret
Computer science
Mathematical optimization
Simple (philosophy)
Mathematical economics
Operations research

No related works found for this paper.