February 5, 2024

Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis

This research presents the first critic-actor algorithm with function approximation and finite-time analysis, used in the long-run average reward setting. The algorithm is designed to solve reinforcement learning problems by combining both policy and value-based methods. The algorithm achieves optimal learning rates and proves to have a sample complexity of O(2.08) for the mean squared error of the critic. The results of numerical experiments on three benchmark settings demonstrate that the critic-actor algorithm competes well with the actor-critic algorithm.

Publication date: 5 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.01371

Post Views: 347

Average Reward MDPs, constrained reinforcement learning, Critic-Actor Algorithm, Finite-Time Analysis, Non-linear function approximation

Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Regularized boosting with an increasing coefficient magnitude stop criterion as meta-learner in hyperparameter optimization stacking ensemble

DogSurf: Quadruped Robot Capable of GRU-based Surface Recognition for Blind Person Navigation

Leave a Reply Cancel reply

Please allow ads on our site