The paper introduces ‘JustLMD’, a multimodal dataset that includes dance motion, music, and lyrics. This dataset is used to generate 3D dance motions conditioned on music and lyrics. The authors argue that while much dance synthesis focuses on music-to-dance generation, the semantic information from lyrics is often overlooked. By including lyrics, the dance generation process can be enriched, enhancing the semantic meaning of the dance. The dataset includes 4.6 hours of 3D dance motion in 1867 sequences, accompanied by musical tracks and their corresponding English lyrics. The authors also present a cross-modal diffusion model for dance synthesis.

 

Publication date: 30 Sep 2021
Project Page: https://arxiv.org/abs/2310.00455v1
Paper: https://arxiv.org/pdf/2310.00455