Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition

The article focuses on Speech Emotion Recognition (SER), an essential tool in enhancing human-computer interaction by understanding emotional states. The authors propose a novel approach that combines self-supervised feature extraction using the Wav2Vec model with supervised classification for emotion recognition from small audio segments. The findings suggest that the proposed method outperforms two baseline methods, the support vector machine classifier and transfer learning of a pre-trained Convolutional Neural Network (CNN). The study thus highlights the significance of deep unsupervised feature learning in improving SER and enhancing emotional comprehension in human-computer interactions.

Publication date: 25 Sep 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2309.12714

Post Views: 299

Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech Models

Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

Leave a Reply Cancel reply

Please allow ads on our site