LAVSS: Location-Guided Audio-Visual Spatial Audio Separation

The paper presents LAVSS, a location-guided audio-visual spatial audio separator. Existing monaural audio-visual separation (MAVS) methods often overlook the location of the sound source, which is crucial in VR/AR scenarios. LAVSS addresses this by incorporating spatial cues and positional representations of sounding objects. This enhances the distinction between similar audio sources located in different directions. LAVSS also uses multi-level cross-modal attention for visual-positional collaboration with audio features, and a pre-trained monaural separator to boost spatial audio separation. Tests on the FAIR-Play dataset show LAVSS’s superiority over existing benchmarks.

Publication date: 31 Oct 2023
Project Page: https://yyx666660.github.io/LAVSS/
Paper: https://arxiv.org/pdf/2310.20446

Post Views: 299

LAVSS: Location-Guided Audio-Visual Spatial Audio Separation

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

C2C: Cough to COVID-19 Detection in BHI 2023 Data Challenge

Study of speaker localization with binaural microphone array incorporating auditory filters and lateral angle estimation

Leave a Reply Cancel reply

Please allow ads on our site