The study introduces LongStoryShort, a model that uses GPT-3 for narrative video question answering. The model first summarizes the narrative of the video into a short plot and then searches for parts of the video relevant to the question. The model also enhances visual matching with CLIPCheck. The study highlights the potential of zero-shot QA for long videos, as the model outperforms state-of-the-art supervised models by a large margin.

 

Publication date: 2 Nov 2023
Project Page: https://jiwanchung.github.io, https://yj-yu.github.io/homeMIR
Paper: https://arxiv.org/pdf/2311.01233