The paper discusses the challenges of offline reinforcement learning (RL) when dealing with imbalanced datasets that are dominated by suboptimal trajectories. It finds that current offline RL algorithms tend to mimic the suboptimal actions due to their assumption of staying close to the trajectories in the dataset. To overcome this, the researchers propose a new sampling strategy that enables the policy to be constrained to ‘good data’ rather than all actions in the dataset. They present an algorithm that can be used in standard offline RL algorithms, demonstrating significant performance gains in imbalanced datasets.

 

Publication date: 6 Oct 2023
Project Page: https://github.com/Improbable-AI/dw-offline-rl
Paper: https://arxiv.org/pdf/2310.04413