The article introduces a new loss function for speaker recognition with deep neural networks, based on Jeffreys Divergence. This divergence, added to the cross-entropy loss function, allows for maximizing the target value of the output distribution while smoothing the non-target values. It is shown that this loss function provides highly discriminative features and outperforms the state-of-the-art for speaker recognition, especially on out-of-domain data. Additionally, the article includes a theoretical justification of the effectiveness of this loss function and its impact on different dataset types.
Publication date: 29 Dec 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2312.16885