The article presents SynSpeech, a new synthetic speech dataset for the study of disentangled speech representation learning. Despite its importance in several applications, progress in this area has been limited due to the lack of speech datasets with known generative factors. SynSpeech contains a million utterances, with ground truth factors that include speaker identity, spoken text, prosody, and emotional tone. It is designed to address this gap, offering a benchmark dataset for a more rigorous evaluation of disentangled speech representation learning methods.

 

Publication date: 8 Nov 2023
Project Page: https://arxiv.org/abs/2311.03389
Paper: https://arxiv.org/pdf/2311.03389