Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis
This research presents the first critic-actor algorithm with function approximation and finite-time analysis, used in the long-run average reward setting. The algorithm is designed to solve reinforcement learning problems by…
Continue reading