The article presents a new domain adaptation method for speech enhancement, termed as Remixed2Remixed. It uses Noise2Noise learning to adapt models trained on artificially generated noisy-clean pair data to enhance real-world noisy data. The method employs a teacher model trained on out-of-domain data to get pseudo-in-domain speech and noise signals. These signals are then shuffled and remixed twice in each batch to generate two bootstrapped mixtures. The student model is trained using an N2N-based cost function computed from these mixtures. The method outperformed the existing systems in tests on the CHiME-7 unsupervised domain adaptation task for conversational speech enhancement.
Publication date: 29 Dec 2023
Project Page: https://www.cyberagent.co.jp/en/news/detail/id=24461
Paper: https://arxiv.org/pdf/2312.16836