SoundCount: Sound Counting from Raw Audio with Dyadic Decomposition Neural Network

This paper proposes and tests a novel end-to-end trainable neural network called DyDecNet for counting the number of distinct sounds in raw audio, a problem which has been underexplored despite its importance in various fields. The DyDecNet uses dyadic decomposition to progressively decompose the raw waveform along the frequency axis to obtain a time-frequency representation in a multi-stage, coarse-to-fine manner. The research also introduces an energy gain normalization to normalize sound loudness variance and spectrum overlap, and designs three polyphony-aware metrics to better quantify sound counting difficulty level. The paper demonstrates DyDecNet’s superiority on various datasets and its potential to tackle other acoustic tasks.

Publication date: 29 Dec 2023
Project Page: github.com/yuhanghe01/SoundCount
Paper: https://arxiv.org/pdf/2312.16149

Post Views: 318

SoundCount: Sound Counting from Raw Audio with Dyadic Decomposition Neural Network

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection

EnchantDance: Unveiling the Potential of Music-Driven Dance Movement

Leave a Reply Cancel reply

Please allow ads on our site