September 25, 2023

Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

The article discusses recent advancements in Automatic Speech Recognition (ASR), particularly the increasing model sizes and their impact on computational efficiency. The authors propose a method to use the smallest sufficient model for a given audio sample, improving computational efficiency without significantly affecting performance. They apply this approach to two Whisper models of different sizes. The concept is based on the observation that smaller models often perform optimally on large parts of testing corpora.

Publication date: 25 Sep 2023
Project Page: N/A
Paper: https://arxiv.org/pdf/2309.12712

Post Views: 333

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition

NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

Please allow ads on our site