Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation
The paper introduces Parallel Q-Learning (PQL), a technique that optimizes off-policy reinforcement learning for large-scale GPU-based simulations. PQL is designed to leverage the superior sample efficiency of off-policy learning while…
Continue reading