This research paper focuses on creating a more efficient end-to-end speech recognition model. The researchers propose a dynamic layer-skipping method that uses the CTC blank output from intermediate layers to trigger the skipping of the last few encoder layers for frames with high blank probabilities. This method is designed to improve the inference efficiency of the model, while reducing computation and improving recognition accuracy. The researchers found that this adjustment resulted in a 29% acceleration of the CTC model inference with minor performance degradation.

 

Publication date: 4 Jan 2024
Project Page: https://arxiv.org/abs/2401.02046v1
Paper: https://arxiv.org/pdf/2401.02046