September 25, 2023

Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

The article discusses recent advancements in Automatic Speech Recognition (ASR), particularly the increasing model sizes and their impact on computational efficiency. The authors propose a method to use the smallest sufficient model for a given audio sample, improving computational efficiency without significantly affecting performance. They apply this approach to two Whisper models of different sizes. The concept is based on the observation that smaller models often perform optimally on large parts of testing corpora.

Publication date: 25 Sep 2023
Project Page: N/A
Paper: https://arxiv.org/pdf/2309.12712

Post Views: 277

automatic speech recognition, computation efficiency, model size, multi-talker ASR, Whisper model

Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition

NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

Leave a Reply Cancel reply

Please allow ads on our site