February 23, 2024

Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition

The study presents Daisy-TTS, a text-to-speech system that simulates a broad spectrum of emotions. It uses a prosody encoder to learn emotionally-separable prosody embedding, which acts as a proxy for emotion. This allows the system to simulate primary and secondary emotions, intensity levels, and emotion polarity. The system demonstrated higher emotional speech naturalness and emotion perceivability in perceptual evaluations compared to the baseline.

Publication date: 22 Feb 2024
Project Page: https://rendchevi.github.io/daisy-tts/
Paper: https://arxiv.org/pdf/2402.14523

Post Views: 326

Daisy-TTS, emotional speech simulation, emotional text-to-speech, prosody embedding, structural model of emotions

Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Avoiding an AI-imposed Taylor’s Version of all music history

SICRN: Advancing Speech Enhancement through State Space Model and Inplace Convolution Techniques

Leave a Reply Cancel reply

Please allow ads on our site