September 25, 2023

Massive End-to-end Models for Short Search Queries

The authors examine two popular end-to-end automatic speech recognition models, Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries. The models employ the neural architecture of Google’s universal speech model and incorporate additional funnel pooling layers to reduce frame rate and speed up training and inference. The authors find that a 900M RNN-T outperforms a 1.8B CTC and is more tolerant to severe time reduction.

Publication date: 22 Sep 2023
Project Page: https://arxiv.org/abs/2309.12963v1
Paper: https://arxiv.org/pdf/2309.12963

Post Views: 296

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

VIC-KD: Variance-Invariance-Covariance Knowledge Distillation to Make Keyword Spotting More Robust Against Adversarial Attacks

Please allow ads on our site