Minimax optimal instance-dependent regret Papers

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

root October 3, 2023 0

The paper proposes an algorithm, Pessimistic Nonlinear Least-Square Value Iteration (PNLSVI), for offline reinforcement learning with non-linear function approximation. The algorithm includes three innovative components: a variance-based weighted regression scheme,…

Press ESC to close

Minimax optimal instance-dependent regret

Please allow ads on our site