The paper by Jon Schneider and Julian Zimmert from Google Research addresses the problem of designing contextual bandit algorithms in cross-learning settings, where the learner observes the loss for the action they play in all possible contexts. They provide an efficient algorithm that resolves an open problem of Balseiro et al. by offering nearly tight regret bounds, independent of the number of contexts. The algorithm uses a novel technique for coordinating the execution of learning over multiple epochs, which could also be of interest for other learning problems involving estimation of an unknown context distribution.

 

Publication date: 3 Jan 2024
Project Page: https://arxiv.org/abs/2401.01857v1
Paper: https://arxiv.org/pdf/2401.01857