The article discusses recent advancements in Automatic Speech Recognition (ASR), particularly the increasing model sizes and their impact on computational efficiency. The authors propose a method to use the smallest sufficient model for a given audio sample, improving computational efficiency without significantly affecting performance. They apply this approach to two Whisper models of different sizes. The concept is based on the observation that smaller models often perform optimally on large parts of testing corpora.
Publication date: 25 Sep 2023
Project Page: N/A
Paper: https://arxiv.org/pdf/2309.12712