Deep Neural Networks for Automatic Speaker Recognition Do Not Learn Supra-Segmental Temporal Features

The study presents a novel test to measure how much the performance of neural networks in speaker recognition can be attributed to the modeling of supra-segmental temporal features (SST). It found that various CNN- and RNN-based architectures do not sufficiently model SST, even when forced to. The findings provide a basis for further research into better exploiting the full speech signal and offer insights into the workings of these networks, enhancing the explainability of deep learning for speech technologies.

Publication date: 2 Nov 2023
Project Page: https://arxiv.org/abs/2311.00489v2
Paper: https://arxiv.org/pdf/2311.00489

Post Views: 253

deep neural networks, Explainability, Speaker Recognition, Speech Technologies, Supra-Segmental Temporal Features

Deep Neural Networks for Automatic Speaker Recognition Do Not Learn Supra-Segmental Temporal Features

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Active Noise Control Portable Device Design

Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables

Leave a Reply Cancel reply

Please allow ads on our site