This research introduces dynamic Convolutional Neural Networks (CNNs) that outperform traditional efficient CNNs and Transformers in terms of the performance complexity trade-off and parameter efficiency. These dynamic CNNs are constructed of dynamic non-linearities, dynamic convolutions and attention mechanisms. They are used for audio tagging on large-scale datasets like AudioSet. The researchers show that these dynamic CNNs perform better on downstream tasks and scale up well, surpassing Transformers’ performance on AudioSet and several downstream tasks.

 

Publication date: 25 Oct 2023
Project Page: https://github.com/fschmid56/EfficientAT
Paper: https://arxiv.org/pdf/2310.15648