January 31, 2024

Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes

The authors have developed an enhanced audio-visual Sound Event Localization and Detection (SELD) network, improving on the audio-only SELDnet23 model by integrating audio and video information. The system uses YOLO and DETIC object detectors, with a framework that implements audio-visual data augmentation and synthetic data generation. The new SELD system outperforms the existing audio-visual SELD baseline. The authors also introduce novel video and audio processing techniques for model training, and provide their work as an open-source framework.

Publication date: 31 Jan 2024
Project Page: https://github.com/aromanusc/SoundQ
Paper: https://arxiv.org/pdf/2401.17129

Post Views: 261

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

A Proactive and Dual Prevention Mechanism against Illegal Song Covers empowered by Singing Voice Conversion

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Please allow ads on our site