ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

The article introduces ‘Anim-400K’, a large-scale dataset designed to aid in the automated end-to-end dubbing of video content. With 60% of online content published in English and only 18.8% of the global population speaking English, there’s a clear disparity in access to information. Automated processes for dubbing – replacing the audio track of a video with a translated alternative – remain complex due to the need for precise timing, facial movement synchronization, and prosody matching. The Anim-400K dataset, comprising over 425K aligned animated video segments in Japanese and English, aims to support various tasks including automated dubbing, simultaneous translation, video summarization, and genre/theme/style classification.

Publication date: 11 Jan 2024
Project Page: https://github.com/davidmchan/Anim400K
Paper: https://arxiv.org/pdf/2401.05314

Post Views: 266

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Music Genre Classification: A Comparative Analysis of CNN and XGBoost Approaches with Mel-frequency cepstral coefficients and Mel Spectrograms

Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection

Leave a Reply Cancel reply

Please allow ads on our site