The article presents a single model for multilingual audio-visual speech recognition tasks. The researchers were inspired by the human cognitive system’s ability to distinguish different languages without conscious effort. They designed a model that can recognize which language is given as an input speech by distinguishing between languages’ inherent similarities and differences. This work contributes to developing robust and efficient multilingual audio-visual speech recognition systems and reduces the need for language-specific models.
Publication date: 25 Oct 2023
Project Page: N/A
Paper: https://arxiv.org/pdf/2310.14946