Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
The paper explores the efficiency of self-supervised pre-trained audio models. It posits that these models can achieve comparable inference efficiency to more complex models that use speech transformer encoders. These…
Continue reading