EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition

This article introduces EmoCLIP, a new vision-language model that enhances learning of rich latent representations for zero-shot classification. The model is tested using zero-shot classification on four popular dynamic FER datasets. The results show significant improvements compared to baseline methods, outperforming CLIP by over 10% in terms of Weighted Average Recall and 5% in terms of Unweighted Average Recall. The model also performs well in the downstream task of mental health symptom estimation, achieving a performance comparable or superior to state-of-the-art methods and strong agreement with human experts.

Publication date: 25 Oct 2023
Project Page: https://github.com/NickyFot/EmoCLIP
Paper: https://arxiv.org/pdf/2310.16640

Post Views: 297

EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Human-centred explanation of rule-based decision-making systems in the legal domain

Decoding Stumpers: Large Language Models vs. Human Problem-Solvers

Leave a Reply Cancel reply

Please allow ads on our site