VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023

This paper discusses the T02 team’s system for the Singing Voice Conversion Challenge 2023 (SVCC2023). The system includes a VITS-based SVC model, which consists of a feature extractor, a voice converter, and a post-processor. The feature extractor uses a HuBERT model to provide F0 contours and extracts speaker-independent linguistic content from the input singing voice. The voice converter recomposes the speaker timbre, F0, and linguistic content to generate the waveform of the target speaker. The paper also introduces a fine-tuned DSPGAN vocoder to resynthesise the waveform and improve the audio quality. The system achieved superior performance in the challenge, especially in the cross-domain task.

Publication date: 10 Oct 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2310.05118

Post Views: 301

VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification

SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

Leave a Reply Cancel reply

Please allow ads on our site