The paper introduces Parallel Q-Learning (PQL), a technique that optimizes off-policy reinforcement learning for large-scale GPU-based simulations. PQL is designed to leverage the superior sample efficiency of off-policy learning while outperforming on-policy methods in terms of wall-clock time. It achieves this by simultaneously collecting data, learning policies, and determining values. This approach distinguishes PQL from previous distributed off-policy learning efforts, making it highly effective in massively parallel GPU-based simulations.

 

Publication date: July 24, 2023
Project Page: https://github.com/Improbable-AI/pql
Paper: https://arxiv.org/pdf/2307.12983.pdf