Optimal cross-learning for contextual bandits with unknown context distributions

The paper by Jon Schneider and Julian Zimmert from Google Research addresses the problem of designing contextual bandit algorithms in cross-learning settings, where the learner observes the loss for the action they play in all possible contexts. They provide an efficient algorithm that resolves an open problem of Balseiro et al. by offering nearly tight regret bounds, independent of the number of contexts. The algorithm uses a novel technique for coordinating the execution of learning over multiple epochs, which could also be of interest for other learning problems involving estimation of an unknown context distribution.

Publication date: 3 Jan 2024
Project Page: https://arxiv.org/abs/2401.01857v1
Paper: https://arxiv.org/pdf/2401.01857

Post Views: 326

Optimal cross-learning for contextual bandits with unknown context distributions

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Dataset Difficulty and the Role of Inductive Bias

Transformer Neural Autoregressive Flows

Leave a Reply Cancel reply

Please allow ads on our site