Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

The article presents Distil-Whisper, a distilled variant of the Whisper model for speech recognition. To address the challenges of running large models in low-latency or resource-constrained environments, the researchers used pseudo-labelling to assemble a large-scale open-source dataset. They applied a simple word error rate (WER) heuristic to select the highest quality pseudo-labels for training. The resulting Distil-Whisper model is 5.8 times faster with 51% fewer parameters than the original Whisper model, while maintaining similar performance levels and robustness to difficult acoustic conditions. The model is less prone to hallucination errors on long-form audio and is designed to be paired with Whisper for speculative decoding.

Publication date: 1 Nov 2023
Project Page: https://arxiv.org/abs/2311.00430v1
Paper: https://arxiv.org/pdf/2311.00430

Post Views: 354

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Detecting Syllable-Level Pronunciation Stress with A Self-Attention Model

C2C: Cough to COVID-19 Detection in BHI 2023 Data Challenge

Leave a Reply Cancel reply

Please allow ads on our site