Distilling Conditional Diffusion Models for Offline Reinforcement Learning through Trajectory Stitching

The article discusses the use of deep generative models in offline reinforcement learning and the computational challenges posed by their large model size. To address this, the authors propose a knowledge distillation method based on data augmentation. High-return trajectories are generated from a conditional diffusion model and blended with original trajectories through a novel stitching algorithm. This algorithm makes use of a new reward generator. The resulting data set, when applied to behavioral cloning, produces a shallow policy that outperforms or matches deep generative planners on several D4RL benchmarks.

Publication date: 2 Feb 2024
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2402.00807

Post Views: 237

Press ESC to close

Share Article:

root

Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI

Signal Quality Auditing for Time-series Data

Please allow ads on our site