The article ‘Accent-VITS: accent transfer for end-to-end TTS’ presents a new model for accent transfer in text-to-speech systems. The main challenge is to separate speaker timbre and accent effectively. The Accent-VITS model, based on the VITS structure, makes significant improvements to enable efficient and stable accent transfer. It uses a hierarchical CVAE structure to model accent pronunciation information and acoustic features. The model achieves higher speaker similarity, accent similarity and speech naturalness compared to other models. The introduction of Accent-VITS has significant implications for improving the user experience in TTS applications by allowing for more diverse language environments and user needs.
Publication date: 28 Dec 2023
Project Page: https://anonymous-accentvits.github.io/AccentVITS/arXiv:2312.16850v1
Paper: https://arxiv.org/pdf/2312.16850