Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

This paper presents Video2Music, a novel AI framework for generating music that matches the emotion of a provided video. The authors first curated a unique collection of music videos, extracting semantic, scene offset, motion, and emotion features. These features served as input to the music generation model. They also created a new dataset, MuVi-Sync, using this process. The music is then generated using an Affective Multimodal Transformer model, which ensures affective similarity between the video and the generated music. Post-processing is done to ensure dynamic rendering of rhythm and volume. The framework was confirmed to generate music that matches the video content in terms of emotion through a user study.

Publication date: 3 Nov 2023
Project Page: https://arxiv.org/abs/2311.00968v1
Paper: https://arxiv.org/pdf/2311.00968

Post Views: 306

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

AiluRus: A Scalable ViT Framework for Dense Prediction

E3 TTS: Easy End-to-End Diffusion-based Text to Speech

Leave a Reply Cancel reply

Please allow ads on our site