Fewer-token Neural Speech Codec with Time-invariant Codes

This research paper discusses a new neural speech codec named TiCodec which has been designed to improve the efficiency and effectiveness of language model-based Text-to-Speech (TTS) models. Traditional TTS models suffer from excessive token sequences that can negatively impact prediction accuracy. TiCodec addresses this issue by encoding time-invariant information into a separate code, reducing the amount of frame-level information that needs encoding and thereby decreasing the number of tokens. The study finds that TiCodec can enhance the quality of reconstructed speech with fewer tokens and improve the similarity, naturalness, and word error rate of synthesized speech.

Publication date: 4 Oct 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2310.00014

Post Views: 284

Fewer-token Neural Speech Codec with Time-invariant Codes

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

GASS: Generalizing Audio Source Separation with Large-scale Data

Scaling Up Music Information Retrieval Training with Semi-Supervised Learning

Leave a Reply Cancel reply

Please allow ads on our site