Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
This paper presents a novel Conversational Automatic Speech Recognition (ASR) system that extends the Conformer encoder-decoder model with cross-modal conversational representation. The approach combines pre-trained speech and text models through…
Continue reading