The article discusses the development of Fast-HuBERT, an efficient training framework for self-supervised learning (SSL) models for speech recognition. The authors address the high computational cost of existing SSL models, which has been a barrier to their further application and research. Fast-HuBERT provides a solution to this problem by optimizing the training process, resulting in a 5.2x speedup without performance degradation. The framework is trained on the Librispeech 960h benchmark using 8 V100 GPUs in just 1.1 days. The authors also explore two well-studied techniques within Fast-HuBERT, demonstrating consistent improvements.
Publication date: 26 Sep 2023
Project Page: https://github.com/yanghaha0908/FastHuBERT
Paper: https://arxiv.org/pdf/2309.13860