This paper presents a method to improve the understanding and analysis of long-format videos. The authors propose a query-aware method for localizing and understanding relations in long videos by using an image-language pretrained model. The model selects frames relevant to specific queries, eliminating the need for a complete movie-level knowledge graph. The approach outperforms in different types of queries, demonstrating its effectiveness and robustness.
Publication date: 20 Oct 2023
Project Page: https://doi.org/10.1145/3581783.3612871
Paper: https://arxiv.org/pdf/2310.12724