A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis

The article proposes a Discourse-level Multi-scale text Prosodic Model (D-MPM) that predicts prosodic features for fine-grained emotion analysis. This model can guide the speech synthesis process to produce more expressive speech. A new Discourse-level Chinese Audiobook (DCA) dataset with over 13,000 annotated utterances is also introduced for model evaluation. The model showed promising results in predicting prosodic features and improving user experience. Interestingly, the synthesized speech through this model was found to be better than the original speech in some user evaluation parameters.

Publication date: 25 Sep 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2309.11849

Post Views: 356

Press ESC to close

Share Article:

root

Audio Contrastive based Fine-tuning

TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification

Please allow ads on our site