This paper presents CrossSinger, a cross-lingual singing voice synthesizer based on Xiaoicesing2. The system is notable for its ability to produce high-fidelity singing voices from monolingual singers in multiple languages. This is achieved by using the International Phonetic Alphabet to unify the representation for all languages, and incorporating language information into the model for better pronunciation. The system was tested on a combination of three singing voice datasets in Japanese, English, and Chinese, and was found to perform well, even in code-switch scenarios.
Publication date: 25 Sep 2023
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2309.12672