Stochastic Linear Optimization Under Bandit Feedback
University of Chicago · Toyota Technological Institute at Chicago
Abstract
In the classical stochastic k-armed bandit problem, in each of a sequence of rounds, a decision maker chooses one of k arms and incurs a cost chosen from an unknown distribution associated with that arm. In the linear optimization analog of this problem, rather than finitely many arms, the decision set is a compact subset of R n and the cost of each decision is just the evaluation of a randomly chosen linear cost function at that point. As before, it is assumed that the cost functions are sampled independently from an unknown but time-invariant distribution. The goal is to minimize the total cost incurred over some number of rounds T, and success is measured by low regret. Auer [2003] was the first to study…
Citation impact
- FWCI
- 11.39
- Percentile
- 100%
- References
- 14
Authors
3Topics & keywords
- Regret
- Mathematics
- Mathematical optimization
- Curse of dimensionality
- Sequence (biology)
- Exponential function
- Polytope
- Upper and lower bounds
- Peace, Justice and strong institutions