Stochastic Linear Optimization Under Bandit Feedback

Dani, Varsha; Hayes, Thomas P.; Kakade, Sham M.

articleScholarlyCommons (University of Pennsylvania)Jan 1, 2008GREEN OA

Stochastic Linear Optimization Under Bandit Feedback

VDVarsha Dani TPThomas P. Hayes SMSham M. Kakade

University of Chicago · Toyota Technological Institute at Chicago

Abstract

In the classical stochastic k-armed bandit problem, in each of a sequence of rounds, a decision maker chooses one of k arms and incurs a cost chosen from an unknown distribution associated with that arm. In the linear optimization analog of this problem, rather than finitely many arms, the decision set is a compact subset of R n and the cost of each decision is just the evaluation of a randomly chosen linear cost function at that point. As before, it is assumed that the cost functions are sampled independently from an unknown but time-invariant distribution. The goal is to minimize the total cost incurred over some number of rounds T, and success is measured by low regret. Auer [2003] was the first to study…

Citation impact

619

total citations

FWCI: 11.39
Percentile: 100%
References: 14

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Regret
Mathematics
Mathematical optimization
Curse of dimensionality
Sequence (biology)
Exponential function
Polytope
Upper and lower bounds

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.