February 23, 2024

Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition

The study presents Daisy-TTS, a text-to-speech system that simulates a broad spectrum of emotions. It uses a prosody encoder to learn emotionally-separable prosody embedding, which acts as a proxy for emotion. This allows the system to simulate primary and secondary emotions, intensity levels, and emotion polarity. The system demonstrated higher emotional speech naturalness and emotion perceivability in perceptual evaluations compared to the baseline.

Publication date: 22 Feb 2024
Project Page: https://rendchevi.github.io/daisy-tts/
Paper: https://arxiv.org/pdf/2402.14523

Post Views: 325

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

Avoiding an AI-imposed Taylor’s Version of all music history

SICRN: Advancing Speech Enhancement through State Space Model and Inplace Convolution Techniques

Please allow ads on our site