Bandits with Abstention under Expert Advice

The article discusses the problem of prediction with expert advice under bandit feedback. The authors propose a model where the learner can abstain from play, which has no reward or loss. They introduce the Confidence-rated Bandits with Abstentions (CBA) algorithm, which significantly improves the reward bounds of the classical EXP4 algorithm. This model is the first to achieve bounds on the expected cumulative reward for general confidence-rated predictors. Preliminary experiments show that CBA improves over existing bandit algorithms.

Publication date: 23 Feb 2024
Project Page: Not specified in the text
Paper: https://arxiv.org/pdf/2402.14585

Post Views: 247

Bandits with Abstention under Expert Advice

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Federated Complex Qeury Answering

OmniPred: Language Models as Universal Regressors

Leave a Reply Cancel reply

Please allow ads on our site