Pheme: Efficient and Conversational Speech Generation

The article introduces the PHEME model series for efficient and conversational speech generation. Unlike existing models that require large neural components and extensive training, PHEME models are compact, high-performing, and can be trained on smaller-scale conversational data. This reduces data demands by over 10x while still delivering quality comparable to state-of-the-art models. The PHEME series also enables parallel speech generation and natural conversational speech. Moreover, the application of teacher-student distillation techniques can further improve voice quality for single-speaker setups.

Publication date: 5 Jan 2024
Project Page: https://arxiv.org/abs/2401.02839v1
Paper: https://arxiv.org/pdf/2401.02839

Post Views: 334

Press ESC to close

Share Article:

root

Thousands of AI Authors on the Future of AI

CrisisViT: A Robust Vision Transformer for Crisis Image Classification

Please allow ads on our site