The paper introduces a new method, Diffusion-ES, that combines gradient-free optimization with trajectory denoising to optimize non-differentiable objectives. This method is particularly useful in autonomous driving scenarios. Diffusion-ES samples trajectories from a diffusion model and scores them using a black-box reward function. The high-scoring trajectories are then mutated using a truncated diffusion process. The paper shows that Diffusion-ES outperforms existing methods and can optimize non-differentiable language-shaped reward functions. It can generate novel, complex behaviors not present in the training data, allowing it to solve difficult autonomous driving scenarios.

 

Publication date: 12 Feb 2024
Project Page: diffusion-es.github.io
Paper: https://arxiv.org/pdf/2402.06559