This research paper discusses a type of output error called hallucinations, produced by deep neural networks in Automatic Speech Recognition (ASR). Hallucinations in ASR are transcriptions generated by a model that are semantically unrelated to the source utterance, but are fluent and coherent. The paper proposes a perturbation-based method to assess the susceptibility of an ASR model to hallucination at test time without needing access to the training dataset. The method helps to distinguish between hallucinatory and non-hallucinatory models that have similar baseline word error rates. The study also devises a framework for identifying hallucinations by analysing their semantic connection with the ground truth and their fluency.

 

Publication date: 4 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.01572