TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

The article introduces a model named TransFace, designed to translate audio-visual speech directly into other languages, overcoming the challenges of delay and cascading errors associated with current methods. The model includes a speech-to-unit translation component and a unit-based audio-visual speech synthesizer, Unit2Lip. The model also introduces a Bounded Duration Predictor to ensure isometric talking head translation and prevent duplicate reference frames. The model demonstrated significant improvements in synchronization and inference speed, with impressive BLEU scores.

Publication date: 23 Dec 2023
Project Page: https://transface-demo.github.io/
Paper: https://arxiv.org/pdf/2312.15197

Post Views: 322

Press ESC to close

Share Article:

root

Combinatorial music generation model with song structure graph analysis

SAIC: Integration of Speech Anonymization and Identity Classification

Please allow ads on our site