AE-Flow: AutoEncoder Normalizing Flow

The article discusses the use of normalizing flows in voice conversion (VC) tasks and introduces a new training paradigm called AutoEncoder Normalizing Flow (AE-Flow). Normalizing flows are unsupervised generative models that have shown promising results in text-to-speech and VC. AE-Flow introduces supervision to the training process without the need for parallel data, and adds a reconstruction loss to force the model to use conditioning information to reconstruct an audio sample. The study compares the performance of the AE-Flow model with other models trained with different loss functions and finds that AE-Flow systematically improves speaker similarity and naturalness.

Publication date: 29 Dec 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2312.16552

Post Views: 325

AE-Flow: AutoEncoder Normalizing Flow

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions

Frame-level emotional state alignment method for speech emotion recognition

Leave a Reply Cancel reply

Please allow ads on our site