The article discusses the problem of prediction with expert advice under bandit feedback. The authors propose a model where the learner can abstain from play, which has no reward or loss. They introduce the Confidence-rated Bandits with Abstentions (CBA) algorithm, which significantly improves the reward bounds of the classical EXP4 algorithm. This model is the first to achieve bounds on the expected cumulative reward for general confidence-rated predictors. Preliminary experiments show that CBA improves over existing bandit algorithms.

 

Publication date: 23 Feb 2024
Project Page: Not specified in the text
Paper: https://arxiv.org/pdf/2402.14585