Press ESC to close

Sound

The creation, transmission, and interpretation of audio signals on computer systems.

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

root 0

The article introduces a novel audio-video recognition approach called the Audio-Video Transformer (AVT) that uses effective spatio-temporal representation for improved action recognition. The research reduces cross-modality complexity via an audio-video…

Continue reading