VOT: Revolutionizing Speaker Verification with Memory and Attention Mechanisms

The article introduces a new model called VOT to revolutionize speaker verification. It proposes a Memory-Attention framework that uses a deep feed-forward temporal memory network (DFSMN) into a self-attention mechanism. This captures long-term context and enhances the modeling of local dependencies. The VOT model uses a parallel variable weight summation structure and an attention-based statistical pooling layer. The authors also propose a new loss function, AM-Softmax-Focal, to address the hard sample mining problem. The performance of the VOT model on the VoxCeleb1 dataset showed significant improvement, outperforming most mainstream models.

Publication date: 29 Dec 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2312.16826

Post Views: 291

VOT: Revolutionizing Speaker Verification with Memory and Attention Mechanisms

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Binaural recording methods with analysis on inter-aural time, level, and phase differences

Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification

Leave a Reply Cancel reply

Please allow ads on our site