Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

The paper discusses the challenges in automatic recognition of dysarthric speech caused by motor-neuro conditions and physical disabilities. It presents a comparative study of data augmentation approaches to enhance pre-trained Automatic Speech Recognition (ASR) models for dysarthric speech. The methods include conventional speaker-independent perturbation, speaker-dependent speed perturbation, and a novel Spectral basis GAN-based adversarial data augmentation. The experiments suggest that GAN-based data augmentation consistently outperforms other models. The study aims to address the data scarcity issue for dysarthric speech recognition and suggests alternative solutions like self-supervised learning based speech foundation models.

Publication date: 4 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.00662

Post Views: 268

Press ESC to close

Share Article:

root

Detecting the presence of sperm whales echolocation clicks in noisy environments

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

Please allow ads on our site