Text-to-Speech

Computation and Language Machine Learning

E3 TTS: Easy End-to-End Diffusion-based Text to Speech

root November 4, 2023 0

The article introduces a novel end-to-end text-to-speech model called E3 TTS, which is based on diffusion. Unlike previous models, E3 TTS does not rely on intermediate representations such as spectrogram…

Sound

An overview of text-to-speech systems and media applications

root October 25, 2023 0

This article provides a comprehensive overview of Text-to-Speech (TTS) systems and their applications in media. It explores the complexity of designing TTS systems, which typically require a text frontend, a…

Sound

PromptSpeaker: Speaker Generation Based on Text Descriptions

root October 10, 2023 0

The article discusses the development and functionality of PromptSpeaker, a system that uses text prompts to generate custom speaker voices. The PromptSpeaker system consists of a prompt encoder, a zero-shot…

Sound

ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech

root October 4, 2023 0

The paper presents ReFlow-TTS, a new method for text-to-speech (TTS) synthesis offering high-fidelity speech synthesis. Unlike traditional models that require numerous sampling steps, ReFlow-TTS simplifies the process using an Ordinary…

Computation and Language Sound

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

root October 4, 2023 0

Modern speech synthesis systems have significantly improved, making synthetic speech indistinguishable from real speech. However, evaluating synthetic speech remains a challenge. Human evaluation using Mean Opinion Score (MOS) is ideal…

Computation and Language Machine Learning

Towards human-like spoken dialogue generation between AI agents from written dialogue

root October 4, 2023 0

The study by Kentaro Mitsui, Yukiya Hono, and Kei Sawada presents CHATS – a system for generating spoken dialogues from written ones between AI agents. Large Language Models (LLMs) have…

Sound

Fewer-token Neural Speech Codec with Time-invariant Codes

root October 4, 2023 0

This research paper discusses a new neural speech codec named TiCodec which has been designed to improve the efficiency and effectiveness of language model-based Text-to-Speech (TTS) models. Traditional TTS models…

Sound

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

root September 25, 2023 0

This academic paper introduces a novel zero-shot text-to-speech (TTS) model that can replicate the voice of an unseen speaker without the need for adaptation parameters. The model utilizes multi-scale acoustic…

Sound

DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis

root September 25, 2023 0

This paper introduces DurIAN-E, an improved duration informed attention neural network for expressive and high-quality text-to-speech synthesis. DurIAN-E uses multiple stacked SwishRNN-based Transformer blocks as linguistic encoders and incorporates Style-Adaptive…

Page 1 of 1

Press ESC to close

Text-to-Speech

Please allow ads on our site