Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech Models

This study examines the effectiveness of audio augmentation in improving self-supervised representation learning (SSRL) models for low resource languages. The researchers compared different augmentation techniques such as pitch variation, noise addition, accented target-language speech, and other language speech. The results showed that combined augmentations (noise/pitch) was the most effective, outperforming accent and language knowledge transfer. The findings suggest that for resource-constrained languages, synthetic augmentation can be more beneficial than knowledge transfer from accented or other languages.

Publication date: 25 Sep 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2309.12763

Post Views: 302

Press ESC to close

Share Article:

root

A Study on Incorporating Whisper for Robust Speech Assessment

Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition

Please allow ads on our site