The paper introduces Video Director GPT, a novel framework for generating consistent multi-scene videos using the knowledge of Large Language Models (LLMs). The process begins with a text prompt, which is expanded into a video plan by the video planner LLM (GPT-4). This plan includes scene descriptions, entity layouts, backgrounds, and consistency groupings. The output from the video planner guides the video generator, Layout2Vid, which controls spatial layouts and maintains temporal consistency across multiple scenes. The framework has demonstrated improved layout and movement control in video generation and can generate videos with visual consistency across scenes. Additionally, it can dynamically control layout guidance strength and generate videos using user-provided images.

 

Publication date: 26 Sep 2023
Project Page: videodirectorgpt.github.io
Paper: https://arxiv.org/pdf/2309.15091