Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks
The study focuses on off-policy evaluation (OPE) in reinforcement learning using human preference data. The authors explore the sample efficiency of OPE, establishing a statistical guarantee for it. They approach…
Continue reading