TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

The article discusses a novel approach to improve Active Speaker Detection (ASD) – a task to identify if a person is speaking in a series of video frames. The authors propose TalkNCE, a unique talk-aware contrastive loss that encourages the model to learn effective representations through the natural correspondence of speech and facial movements. This loss can be jointly optimized with the existing objectives for training ASD models without needing additional supervision or training data. The study demonstrates that this loss can be easily integrated into existing ASD frameworks, thereby improving their performance. The method achieves state-of-the-art performances on AVA-ActiveSpeaker and ASW datasets.

Publication date: 21 Sep 2023
Project Page: https://arxiv.org/abs/2309.12306
Paper: https://arxiv.org/pdf/2309.12306

Post Views: 349

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

SlowFast Network for Continuous Sign Language Recognition

Leave a Reply Cancel reply

Please allow ads on our site