This research paper explores the use of connectionist techniques in phonetic speech recognition with strong latency constraints. The study’s aim is to enhance telephone conversation for hearing-impaired people by creating a system that can classify speech into a sequence of phonetic/visemic units. These units can generate synchronised lip movements in a synthetic talking face or avatar. The paper pays particular attention to the interaction between the time evolution model learnt by the multi-layer perceptrons and the transition model imposed by the Viterbi decoder under different latency conditions.

 

Publication date: 15 Jan 2024
Project Page: https://doi.org/10.1016/j.specom.2005.05.005
Paper: https://arxiv.org/pdf/2401.06588