The article discusses the development of the Open Whisper-style Speech Model (OWSM), a tool that reproduces the Whisper-style training of speech models using an open-source toolkit and publicly available data. The Whisper model from OpenAI is a multilingual multitask model trained on a large volume of speech data. The OWSM follows the Whisper design and supports tasks such as language identification, multilingual automatic speech recognition, and utterance-level segmentation. However, unlike Whisper, the OWSM is designed to be more efficient and support more translation directions. The authors will release all scripts used for data preparation, training, inference, and scoring, as well as pre-trained models and training logs to promote open science.
Publication date: 26 Sep 2023
Project Page: https://github.com/espnet/espnet
Paper: https://arxiv.org/pdf/2309.13876