The research focuses on the challenges of offline reinforcement learning (RL), particularly its vulnerability to data corruption. The authors investigate the performance of various offline RL algorithms under different types of data corruption, finding that Implicit Q-learning (IQL) is relatively resilient. Despite this, IQL is still susceptible to dynamics corruption. To address this, the authors propose a robust version of IQL (RIQL) that uses Huber loss and quantile estimators to balance penalties for corrupted data and learning stability. Experiments show that RIQL performs robustly in a variety of data corruption scenarios.
Publication date: 20 Oct 2023
Project Page: https://arxiv.org/pdf/2310.12955v1.pdf
Paper: https://arxiv.org/pdf/2310.12955