The article discusses a new approach to enhance end-to-end automatic speech recognition (ASR). The authors propose adding a memory-augmented neural network between the encoder and decoder of a conformer, an approach that outperforms recurrent neural network-based approaches and transformers. This external memory can enrich the generalization for longer utterances as it allows the system to store and retrieve more information recurrently. The authors use the neural Turing machine (NTM) that results in a new model architecture for ASR, called the Conformer-NTM. Experimental results show that this new system outperforms the baseline conformer without memory for long utterances.

 

Publication date: 25 Sep 2023
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2309.13029