The article discusses the improvement of bandits in many-to-one matching markets while ensuring incentive compatibility. Earlier methods lacked guarantees of incentive compatibility and were far from optimal. The authors propose the adaptively explore-then-deferred-acceptance (AETDA) algorithm. The results demonstrate a significant improvement over previous methods, offering robust assurances, and applying to more general markets. The study also considers broader substitutable preferences, ensuring the existence of a stable matching and responsiveness.
Publication date: 3 Jan 2024
Project Page: arXiv:2401.01528v1 [cs.LG] Paper: https://arxiv.org/pdf/2401.01528