The paper introduces a method for Temporal Sentence Grounding (TSG) in long-form egocentric datasets such as Ego4D and EPIC-Kitchens. The method, known as Clip Merging (Cli Mer), learns to ground sentences using only narrations and their corresponding rough narration timestamps. The approach is shown to be effective when compared with a high performing TSG method. The results showed improvements in mean R@1 from 3.9 to 5.7 on Ego4D and from 10.7 to 13.0 on EPIC-Kitchens.
Publication date: 26 Oct 2023
Project Page: https://github.com/keflanagan/CliMer
Paper: https://arxiv.org/pdf/2310.17395