November 8, 2023

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

The paper explores the efficiency of self-supervised pre-trained audio models. It posits that these models can achieve comparable inference efficiency to more complex models that use speech transformer encoders. These encoders mix convolutional modules with self-attention modules. However, the study suggests that similar efficiency can be achieved with advanced self-attention alone. This simpler approach is particularly beneficial when combined with a low-bit weight quantization technique of a neural network to improve efficiency.

Publication date: 5 Nov 2023
Project Page: https://arxiv.org/abs/2311.02772v1
Paper: https://arxiv.org/pdf/2311.02772

Post Views: 281

audio representation learning, efficient audio modeling, Inference Efficiency, self-supervised pre-training, transformer encoders

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

A Foundation Model for Music Informatics

Yet Another Generative Model For Room Impulse Response Estimation

Leave a Reply Cancel reply

Please allow ads on our site