Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
The paper presents a new text-to-speech (TTS) framework that utilizes a neural transducer. It divides the TTS pipeline into semantic-level sequence-to-sequence modeling and fine-grained acoustic modeling stages. The framework uses…
Continue reading