Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
This paper focuses on Whisper, a recent automatic speech recognition model trained with a massive 680k hour labeled speech corpus recorded in diverse conditions. The authors show an interesting finding…
Continue reading