FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
The article introduces ‘FunnyNet-W’, a model that relies on cross- and self-attention for visual, audio, and text data to predict funny moments in videos. Unlike most methods that rely on…
Continue reading