The article discusses the development of a full-frequency dynamic convolution (FFDConv) for sound event detection. Traditional 2D convolution was found to be ineffective for this task, as it enforces translation equivariance along the frequency axis, which is not a shift-invariant dimension. To address this, the authors propose a method that generates frequency kernels for every frequency band, enhancing 2D convolution’s capability for frequency-dependent modeling. This method outperformed other techniques in tests, extracting coherent features in specific frequency bands. The results suggest that FFDConv has significant frequency-dependent perception ability, making it a promising tool for sound event detection.

 

Publication date: 10 Jan 2024
Project Page: FFDConv
Paper: https://arxiv.org/pdf/2401.04976