The article presents DistilWhisper, a method aimed at improving the performance of the Whisper speech model, particularly in under-represented languages. Whisper is a multilingual and multitask speech model that covers 99 languages. Despite its robustness and widespread use, it underperforms on some languages. The proposed DistilWhisper uses lightweight modular ASR fine-tuning and knowledge distillation from larger models to improve performance. The approach introduces negligible overhead and is more effective than standard fine-tuning or LoRA adapters.
Publication date: 3 Nov 2023
Project Page: N/A
Paper: https://arxiv.org/pdf/2311.01070