CoMoSVC: Consistency Model-based Singing Voice Conversion

The paper introduces CoMoSVC, a consistency model-based Singing Voice Conversion (SVC) method aimed at achieving high-quality generation and high-speed sampling. The authors first design a diffusion-based teacher model specifically for SVC, from which a student model is further distilled under self-consistency properties to achieve one-step sampling. Tests reveal that CoMoSVC has a significantly faster inference speed than the state-of-the-art diffusion-based SVC system, while still delivering comparable or superior conversion performance. The audio samples and codes are available online.

Publication date: 3 Jan 2024
Project Page: https://comosvc.github.io/
Paper: https://arxiv.org/pdf/2401.01792

Post Views: 291

CoMoSVC: Consistency Model-based Singing Voice Conversion

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Exploring Multi-Modal Control in Music-Driven Dance Generation

Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models

Leave a Reply Cancel reply

Please allow ads on our site