Trading off rewards and errors in multi-armed bandits

Erraqabi, Akram; Lazaric, Alessandro; Vaľko, Michal; Brunskill, Emma; Liu, Yun-En

doi:10.48550/arxiv.2605.00488

preprintarXiv (Cornell University)May 1, 2026GREEN OA

Trading off rewards and errors in multi-armed bandits

AEAkram Erraqabi ALAlessandro Lazaric MVMichal Vaľko EBEmma Brunskill YLYun-En Liu

Université de Montréal · Laboratoire d'Informatique de Paris-Nord · +1 more institution

Indexed inarxivdatacite

Abstract

In multi-armed bandits, the most-explored arms are the most informative, while reward maximization typically pulls only the best arm. We study the tradeoff between identifying arm means accurately and accumulating reward, and present an algorithm with regret guarantees that interpolates between the two objectives. We provide both upper and lower bounds and validate empirically.

Citation impact

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

AE
Akram ErraqabiCorresponding
Université de Montréal
AL
Alessandro Lazaric
MV
Michal Vaľko
EB
Emma Brunskill
Laboratoire d'Informatique de Paris-Nord, Carnegie Mellon University
YL
Yun-En Liu
Laboratoire d'Informatique de Paris-Nord, Carnegie Mellon University

Topics & keywords

Topics

Advanced Bandit Algorithms Research100%
Online Learning and Analytics99%
Data Stream Mining Techniques99%

Keywords

Computer science
Maximization
Utility maximization
Artificial intelligence
Machine learning
Mathematical optimization
Economics

No related works found for this paper.

Funding

CM
Carnegie Mellon University
AN
Agence Nationale de la Recherche
Awards: ANR-16-CE23-0003, ANR-14-CE24-0010, CE23-0003
MD
Ministère de l'Education Nationale, de l'Enseignement Superieur et de la Recherche