This article introduces a new algorithm based on the damped Newton method for the H tracking control problem of unknown continuous-time nonlinear systems. The algorithm, named model-free Policy Iteration (PI), is derived from a generalized tracking Bellman equation and can find the optimal solution for the tracking Hamilton-Jacobi-Isaacs (HJI) equation. Two PI reinforcement learning methods, on-policy and off-policy, are detailed. The off-policy PI algorithm doesn’t require prior knowledge of system dynamics. The effectiveness of the algorithm is demonstrated with a nonlinear system simulation.
Publication date: 2023-12-10
Project Page: http://ieeexplore.ieee.org
Paper: https://arxiv.org/pdf/2401.12882