High-precision Voice Search Query Correction via Retrievable Speech-text Embedings

This paper discusses a technique to enhance the accuracy of Automatic Speech Recognition (ASR) systems. The proposed method uses embeddings derived from utterance audio to query a correction database, which helps overcome issues related to phonetic dissimilarity between textual hypotheses and transcript truth. The study demonstrates a 6% reduction in word error rate for utterances whose transcripts appear in the candidate set, without increasing error rate on general utterances. The approach leverages multimodal speech-text embedding networks and nearest-neighbors search for improved recall and precision.

Publication date: 11 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.04235

Post Views: 320

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation

FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild

Leave a Reply Cancel reply

Please allow ads on our site