The article introduces ATGNN (Audio Tagging Graph Neural Network), a deep learning model for audio tagging. It addresses the limitations of CNNs and Transformers by treating the spectrogram as a graph structure and processing it with ATGNN. This enables the model to map semantic relationships between class embeddings and corresponding spectrogram regions. The ATGNN model achieves comparable results to Transformer-based models on audio tagging tasks but with significantly fewer learnable parameters.
Publication date: 8 Nov 2023
Project Page: N/A
Paper: https://arxiv.org/pdf/2311.01526
Leave a comment