The paper discusses the challenges and recent trends in understanding action scenes in soccer. It covers action recognition, spotting, and spatio-temporal action localization, with a focus on multimodal methods that use multiple data sources like video and audio. The authors review publicly available data sources, evaluation metrics, and state-of-the-art models, including deep learning and traditional methods. The potential of multimodal methods to improve the accuracy and robustness of models is also highlighted. The paper concludes with a discussion on open research questions and future directions in soccer action recognition.

 

Publication date: 22 Sep 2023
Project Page: https://arxiv.org/abs/2309.12067v1
Paper: https://arxiv.org/pdf/2309.12067