The article discusses the use of deep generative models in offline reinforcement learning and the computational challenges posed by their large model size. To address this, the authors propose a knowledge distillation method based on data augmentation. High-return trajectories are generated from a conditional diffusion model and blended with original trajectories through a novel stitching algorithm. This algorithm makes use of a new reward generator. The resulting data set, when applied to behavioral cloning, produces a shallow policy that outperforms or matches deep generative planners on several D4RL benchmarks.

 

Publication date: 2 Feb 2024
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2402.00807