Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations

The article presents FreeStyleTTS, a model for expressive text-to-speech (TTS) synthesis with minimal human annotations. This approach leverages a large language model to transform expressive TTS into a style retrieval task. It selects the best-matching style references based on external style prompts, guiding the TTS pipeline to synthesize speech with the intended style. The article demonstrates the model’s proficiency in retrieving desired styles from either input text or user-defined descriptions, resulting in synthetic speeches closely aligned with the specified styles.

Publication date: 3 Nov 2023
Project Page: [email protected]
Paper: https://arxiv.org/pdf/2311.01260

Post Views: 308

Expressive text-to-speech, Large Language Model, ReFlow-TTS, style annotation, style control

Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Server-side Rescoring of Spoken Entity-centric Knowledge Queries for Virtual Assistants

DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts

Leave a Reply Cancel reply

Please allow ads on our site