The article presents a novel paradigm, Self-supervised Reflective Learning (SSRL), for speaker representation learning. SSRL integrates self-supervised knowledge distillation with online clustering to refine pseudo labels and train the model without iterative bottlenecks. A teacher model refines pseudo labels through online clustering, providing dynamic supervision signals to train the student model. The student model undergoes noisy student training to enhance its modeling capacity. The teacher model is updated via an exponential moving average of the student, acting as an ensemble of past iterations. SSRL shows superiority over current iterative approaches, surpassing the performance of a 5-round method in just a single training round.

 

Publication date: 4 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.01473