The paper discusses the challenges in automatic recognition of dysarthric speech caused by motor-neuro conditions and physical disabilities. It presents a comparative study of data augmentation approaches to enhance pre-trained Automatic Speech Recognition (ASR) models for dysarthric speech. The methods include conventional speaker-independent perturbation, speaker-dependent speed perturbation, and a novel Spectral basis GAN-based adversarial data augmentation. The experiments suggest that GAN-based data augmentation consistently outperforms other models. The study aims to address the data scarcity issue for dysarthric speech recognition and suggests alternative solutions like self-supervised learning based speech foundation models.
Publication date: 4 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.00662