Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model
Meta (Israel) · Afterschool Alliance · +1 more institution
Abstract
We study the sample complexity of learning an $ε$-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner has access to a generative model. We show that there exists a worst-case SSP instance with $S$ states, $A$ actions, minimum cost $c_{\min}$, and maximum expected cost of the optimal policy over all states $B_{\star}$, where any algorithm requires at least $Ω(SAB_{\star}^3/(c_{\min}ε^2))$ samples to return an $ε$-optimal policy with high probability. Surprisingly, this implies that whenever $c_{\min} = 0$ an SSP problem may not be learnable, thus revealing that learning in SSPs is strictly harder than in the finite-horizon and discounted…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 13
Authors
4Topics & keywords
- Regret
- Shortest path problem
- Mathematical optimization
- Generalization
- Computer science
- Sample (material)
- Oracle
- Variance reduction