Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
This academic paper introduces a novel zero-shot text-to-speech (TTS) model that can replicate the voice of an unseen speaker without the need for adaptation parameters. The model utilizes multi-scale acoustic…
Continue reading