The article presents a new method, CLC: Contrastive Learning for Conversations, to improve the performance of Automatic Speech Recognition (ASR) models. The method leverages artifacts in unsuccessful conversations with assistant systems for self-supervised learning. The authors demonstrated that their method can improve ASR models’ performance on OD3, a new public large-scale semi-synthetic meta-dataset of audio task-oriented dialogues, by up to 19.2%. The gains also transferred to real-world systems, improving performance by up to 6.7% over baselines.
Publication date: 5 Jan 2024
Project Page: https://github.com/amazon-science/amazon-od3
Paper: https://arxiv.org/pdf/2401.02417