Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
This article discusses the development of a noise-robust zero-shot text-to-speech (TTS) method. The method, based on speaker embeddings extracted from reference speech using self-supervised learning (SSL) speech representations, can reproduce…
Continue reading