Efficient learning by implicit exploration in bandit problems with side observations

Kocák, Tomáš; Neu, Gergely; Vaľko, Michal; Munos, Rémi

doi:10.48550/arxiv.2604.24555

preprintArXiv.orgApr 27, 2026GREEN OA

Efficient learning by implicit exploration in bandit problems with side observations

TKTomáš Kocák GNGergely Neu MVMichal Vaľko RMRémi Munos

Indexed inarxivdatacite

Abstract

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets to observe losses of some other actions. The revealed losses depend on the learner's action and a directed observation system chosen by the environment. For this setting, we propose the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions. Along similar lines, we also define a new partial information setting that models online combinatorial optimization…

Citation impact

130

total citations

FWCI: —
Percentile: —
References: 16

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Regret
Observability
Computer science
Action (physics)
Online learning
Mathematical optimization
Artificial intelligence
Theoretical computer science

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

MD
Ministère de l'Education Nationale, de l'Enseignement Superieur et de la Recherche
Award: 270327