AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in Group Conversations

The research presents AMuSE (Adaptive Multimodal Analysis for Speaker Emotion), a model developed for recognizing individual emotions in group conversations. This model is crucial in developing intelligent agents for natural human-machine interaction. The model uses a Multimodal Attention Network that captures cross-modal interactions at different levels of spatial abstraction. It also uses an Adaptive Fusion technique to combine mode-specific descriptors. The model condenses spatial and temporal features into two dense descriptors: speaker-level and utterance-level. It showed improved classification performance in large-scale public datasets.

Publication date: 31 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.15164

Post Views: 298

AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in Group Conversations

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Music Auto-Tagging with Robust Music Representation Learned via Domain Adversarial Training

Synchformer: Efficient Synchronization from Sparse Cues

Leave a Reply Cancel reply

Please allow ads on our site