The study by Hummel et al. presents a framework for video-to-adverb retrieval and vice versa. The method aligns video embeddings with their corresponding compositional adverb-action text embedding in a joint space. The adverb-action text embedding is learned using a residual gating mechanism. The framework outperforms previous works in retrieving adverbs from videos for unseen adverb-action compositions. The proposed method is relevant for video search and retrieval, and for understanding actions in videos in a detailed manner.

 

Publication date: 26 Sep 2023
Project Page: https://hummelth.github.io/ReGaDa/
Paper: https://arxiv.org/pdf/2309.15086