November 4, 2023

DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts

The article presents DistilWhisper, a method aimed at improving the performance of the Whisper speech model, particularly in under-represented languages. Whisper is a multilingual and multitask speech model that covers 99 languages. Despite its robustness and widespread use, it underperforms on some languages. The proposed DistilWhisper uses lightweight modular ASR fine-tuning and knowledge distillation from larger models to improve performance. The approach introduces negligible overhead and is more effective than standard fine-tuning or LoRA adapters.

Publication date: 3 Nov 2023
Project Page: N/A
Paper: https://arxiv.org/pdf/2311.01070

Post Views: 304

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations

Controllable Music Production with Diffusion Models and Guidance Gradients

Please allow ads on our site