Key Frame Mechanism For Efficient Conformer Based End-to-end Speech Recognition

The article presents a novel method for efficient Conformer-based end-to-end automatic speech recognition. The Conformer block uses a self-attention mechanism to capture global information and a convolutional neural network to capture local information. However, its computational complexity grows with the length of the input sequence. The authors propose a key frame-based self-attention mechanism to reduce this computation. The method involves two encoders and uses an intermediate CTC loss function to compute the label frame. This approach can discard more than 60% of the useless frames during model training and inference, significantly accelerating the inference speed.

Publication date: 25 Oct 2023
Project Page: https://github.com/scufan1990/Key-Frame-Mechanism-For-Efficient-Conformer
Paper: https://arxiv.org/pdf/2310.14954

Post Views: 337

Press ESC to close

Share Article:

root

Novel-View Acoustic Synthesis from 3D Reconstructed Rooms

8+8=4: Formalizing Time Units to Handle Symbolic Music Durations

Please allow ads on our site