Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
This article presents a Bayesian approach to off-policy evaluation (OPE) and off-policy learning (OPL) for large action spaces. The authors propose a unified Bayesian framework, sDM, that leverages action correlations…
Continue reading