Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
The paper proposes an algorithm, Pessimistic Nonlinear Least-Square Value Iteration (PNLSVI), for offline reinforcement learning with non-linear function approximation. The algorithm includes three innovative components: a variance-based weighted regression scheme,…
Continue reading