January 31, 2024

Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes

The authors have developed an enhanced audio-visual Sound Event Localization and Detection (SELD) network, improving on the audio-only SELDnet23 model by integrating audio and video information. The system uses YOLO and DETIC object detectors, with a framework that implements audio-visual data augmentation and synthetic data generation. The new SELD system outperforms the existing audio-visual SELD baseline. The authors also introduce novel video and audio processing techniques for model training, and provide their work as an open-source framework.

Publication date: 31 Jan 2024
Project Page: https://github.com/aromanusc/SoundQ
Paper: https://arxiv.org/pdf/2401.17129

Post Views: 258

3D Detection, Audio-visual correspondence, SELDnet, Sound event localization, soundscapes

Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

A Proactive and Dual Prevention Mechanism against Illegal Song Covers empowered by Singing Voice Conversion

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Leave a Reply Cancel reply

Please allow ads on our site