Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

The article discusses the development of the Open Whisper-style Speech Model (OWSM), a tool that reproduces the Whisper-style training of speech models using an open-source toolkit and publicly available data. The Whisper model from OpenAI is a multilingual multitask model trained on a large volume of speech data. The OWSM follows the Whisper design and supports tasks such as language identification, multilingual automatic speech recognition, and utterance-level segmentation. However, unlike Whisper, the OWSM is designed to be more efficient and support more translation directions. The authors will release all scripts used for data preparation, training, inference, and scoring, as well as pre-trained models and training logs to promote open science.

Publication date: 26 Sep 2023
Project Page: https://github.com/espnet/espnet
Paper: https://arxiv.org/pdf/2309.13876

Post Views: 313

Press ESC to close

Share Article:

root

Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and Challenges

PRiSM: Enhancing Low-Resource Document-Level Relation Extraction with Relation-Aware Score Calibration

Please allow ads on our site