The article presents a new method for visual dubbing, which is the process of generating lip motions of an actor in a video to synchronize with given audio. The method, based on data-efficient neural rendering priors, overcomes the limitations of existing person-generic or person-specific models, enabling high-quality visual dubbing for any actor with just a few seconds of data. The method is scalable, generalizable, and produces results that are visually high-quality and recognizable. The study demonstrates that this approach outperforms existing models in terms of visual quality and recognizability, both quantitatively and qualitatively.

 

Publication date: 11 Jan 2024
Project Page: https://dubbingforeveryone.github.io/
Paper: https://arxiv.org/pdf/2401.06126