The article introduces ‘Anim-400K’, a large-scale dataset designed to aid in the automated end-to-end dubbing of video content. With 60% of online content published in English and only 18.8% of the global population speaking English, there’s a clear disparity in access to information. Automated processes for dubbing – replacing the audio track of a video with a translated alternative – remain complex due to the need for precise timing, facial movement synchronization, and prosody matching. The Anim-400K dataset, comprising over 425K aligned animated video segments in Japanese and English, aims to support various tasks including automated dubbing, simultaneous translation, video summarization, and genre/theme/style classification.
Publication date: 11 Jan 2024
Project Page: https://github.com/davidmchan/Anim400K
Paper: https://arxiv.org/pdf/2401.05314