The study introduces a novel approach to Offline Goal-Conditioned Reinforcement Learning (GCRL), named SMORe. GCRL is crucial for creating generalist agents that can use existing datasets to learn diverse skills without needing hand-engineered reward functions. Existing GCRL approaches often underperform in offline settings. SMORe, however, overcomes these limitations by combining the occupancy matching perspective of GCRL with a convex dual formulation. It learns scores or unnormalized densities representing the importance of taking an action at a state for reaching a particular goal. The authors’ experiments show that SMORe significantly outperforms other methods in robot manipulation and locomotion tasks.
Publication date: 03 Nov 2023
Project Page: https://arxiv.org/abs/2311.02013v1
Paper: https://arxiv.org/pdf/2311.02013