multimodal video Papers

Artificial Intelligence Computer Vision and Pattern Recognition

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

root January 11, 2024 0

The article introduces a novel audio-video recognition approach called the Audio-Video Transformer (AVT) that uses effective spatio-temporal representation for improved action recognition. The research reduces cross-modality complexity via an audio-video…

Press ESC to close

multimodal video

Please allow ads on our site