An overview of text-to-speech systems and media applications

This article provides a comprehensive overview of Text-to-Speech (TTS) systems and their applications in media. It explores the complexity of designing TTS systems, which typically require a text frontend, a predictive model, and a signal-processing vocoder. The article also discusses the shift from conventional concatenative and statistical parametric approaches to neural network-based TTS, which offers higher quality. The use of TTS in various media applications is also covered, highlighting its potential in cost-saving and efficiency. The paper concludes with a comparison of recently released TTS systems.

Publication date: 25 Oct 2023
Project Page: ?
Paper: https://arxiv.org/pdf/2310.14301

Post Views: 312

An overview of text-to-speech systems and media applications

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

MFCC-GAN Codec: A New AI-based Audio Coding

Leave a Reply Cancel reply

Please allow ads on our site