The paper proposes a DeepJointCascade Model (DJCM) for singing voice separation and vocal pitch estimation tasks in music information retrieval. Traditional methods, classified into pipeline methods and naive joint learning methods, have limitations. DJCM uses a joint cascade model structure to train both tasks concurrently, aligning different objectives with task-specific weights. Experimental results show DJCM provides state-of-the-art performance with significant improvements in Signal-to-Distortion Ratio (SDR) and Overall Accuracy (OA). The model’s code is accessible online.

 

Publication date: 11 Jan 2024
Project Page: https://github.com/Dream-High/DJCM
Paper: https://arxiv.org/pdf/2401.03856