articleScholarlyCommons (University of Pennsylvania)Jan 1, 2008GREEN OA

Stochastic Linear Optimization Under Bandit Feedback

University of Chicago · Toyota Technological Institute at Chicago

Abstract

In the classical stochastic k-armed bandit problem, in each of a sequence of rounds, a decision maker chooses one of k arms and incurs a cost chosen from an unknown distribution associated with that arm. In the linear optimization analog of this problem, rather than finitely many arms, the decision set is a compact subset of R n and the cost of each decision is just the evaluation of a randomly chosen linear cost function at that point. As before, it is assumed that the cost functions are sampled independently from an unknown but time-invariant distribution. The goal is to minimize the total cost incurred over some number of rounds T, and success is measured by low regret. Auer [2003] was the first to study…

Citation impact

619
total citations
FWCI
11.39
Percentile
100%
References
14
Citations per year

Authors

3

Topics & keywords

Keywords
  • Regret
  • Mathematics
  • Mathematical optimization
  • Curse of dimensionality
  • Sequence (biology)
  • Exponential function
  • Polytope
  • Upper and lower bounds
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.