This academic article introduces the Bayesian Learning for Contextual Restless Multi-Armed Bandits (BCoR), which is an online reinforcement learning method. It’s particularly useful for public health scenarios where there’s a need to allocate limited resources in a sequential manner. BCoR combines Bayesian modeling techniques with Thompson sampling to handle complex settings like non-stationary and contextual RMABs. It’s designed to quickly learn unknown RMAB transition dynamics in budget-restricted settings, showing better performance than existing methods. One of the applications mentioned includes a public health campaign in India.

 

Publication date: 8 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.04933