The study proposes a new method called TMac for acoustic event classification. This method uses temporal multi-modal graph learning to improve the processing of audiovisual data in deep learning models. TMac constructs a temporal graph for each acoustic event, dividing its audio and video data into multiple segments. Each segment or ‘node’ has temporal relationships or ‘timestamps’ on their edges, allowing dynamic information capture. The method outperformed other state-of-the-art models in performance.

 

Publication date: 25 Sep 2023
Project Page: https://github.com/MGitHubL/TMac
Paper: https://arxiv.org/pdf/2309.11845