Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
In the field of video recognition, achieving high performance usually entails significant computational costs. The proposed Video-FocalNets effectively merges the efficiency of convolutional designs with the global context modeling of…
Continue reading