The article introduces DSNet, a Disentangled Siamese Network with neutral calibration for speech emotion recognition (SER). DSNet aims to address the challenges in SER, such as the unconscious encoding of emotion-irrelevant factors. The network uses an orthogonal feature disentanglement module to divide the high-level representation into two distinct subspaces and a novel neutral calibration mechanism to capture emotion-irrelevant information. This allows for the isolation and emphasis of the emotion-relevant information within speech signals. Experimental results show that DSNet outperforms other state-of-the-art methods for speaker-independent SER.

 

Publication date: 25 Dec 2023
Project Page: https://arxiv.org/abs/2312.15593v1
Paper: https://arxiv.org/pdf/2312.15593