This paper presents ChatSim, a system that allows for the editing of photo-realistic 3D driving scene simulations via natural language commands. It addresses the challenges of user interaction efficiency, multi-camera photo-realistic rendering, and external digital assets integration in existing scene simulation approaches. ChatSim uses a large language model (LLM) agent collaboration framework for high command flexibility and a novel multi-camera neural radiance field method for photo-realistic outcomes. It also employs a multi-camera lighting estimation method for scene-consistent asset rendering. The system has been tested on the Waymo Open Dataset and has demonstrated its ability to handle complex language commands and generate corresponding photo-realistic scene videos.

 

Publication date: 8 Feb 2024
Project Page: https://github.com/yifanlu0227/ChatSim
Paper: https://arxiv.org/pdf/2402.05746