The article focuses on the impact of batch size on pre-training in self-supervised speech representation learning. The study uses varying batch sizes and observes that larger batch sizes result in better pre-trained models, provided the limitations regarding stability and effectiveness are respected. The quality of the pre-trained model is found to depend mainly on the amount of speech data seen during training, which is a product of batch size and number of iterations. These insights can help researchers choose effective operating conditions when studying self-supervised learning in speech.
Publication date: 23 Feb 2024
Project Page: https://github.com/nikvaessen/w2v2-batch-size
Paper: https://arxiv.org/pdf/2402.13723