This research presents the first critic-actor algorithm with function approximation and finite-time analysis, used in the long-run average reward setting. The algorithm is designed to solve reinforcement learning problems by combining both policy and value-based methods. The algorithm achieves optimal learning rates and proves to have a sample complexity of O(2.08) for the mean squared error of the critic. The results of numerical experiments on three benchmark settings demonstrate that the critic-actor algorithm competes well with the actor-critic algorithm.

 

Publication date: 5 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.01371