Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

This article presents a Bayesian approach to off-policy evaluation (OPE) and off-policy learning (OPL) for large action spaces. The authors propose a unified Bayesian framework, sDM, that leverages action correlations without compromising computational efficiency. They also introduce Bayesian metrics that assess average performance across multiple problem instances. The framework is evaluated in OPE and OPL, showing the benefits of leveraging action correlations. The authors use online advertising as an example, where the context is user features, the action is product choice, and the reward is click-through rate. They highlight that existing methods often fail in large action spaces, and the proposed method shows strong performance.

Publication date: 23 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.14664

Post Views: 329

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off

Leave a Reply Cancel reply

Please allow ads on our site