September 25, 2023

DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis

This paper introduces DurIAN-E, an improved duration informed attention neural network for expressive and high-quality text-to-speech synthesis. DurIAN-E uses multiple stacked SwishRNN-based Transformer blocks as linguistic encoders and incorporates Style-Adaptive Instance Normalization (SAIN) layers to enhance expressiveness. A denoiser is also used to improve the synthetic speech quality and expressiveness. The model outperforms state-of-the-art approaches in subjective mean opinion score and preference tests, proving its effectiveness in synthesizing more natural sounding speech.

Publication date: 25 Sep 2023
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2309.12792

Post Views: 308

Associative Transformer, DurIAN-E, Style-Adaptive Instance Normalization, SwishRNN, Text-to-Speech

DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

VIC-KD: Variance-Invariance-Covariance Knowledge Distillation to Make Keyword Spotting More Robust Against Adversarial Attacks

A Study on Incorporating Whisper for Robust Speech Assessment

Leave a Reply Cancel reply

Please allow ads on our site