This study examines the effectiveness of audio augmentation in improving self-supervised representation learning (SSRL) models for low resource languages. The researchers compared different augmentation techniques such as pitch variation, noise addition, accented target-language speech, and other language speech. The results showed that combined augmentations (noise/pitch) was the most effective, outperforming accent and language knowledge transfer. The findings suggest that for resource-constrained languages, synthetic augmentation can be more beneficial than knowledge transfer from accented or other languages.
Publication date: 25 Sep 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2309.12763