The article discusses the use of directional source separation to improve voice quality in modern smart glasses. It explores multiple beamformers to aid source separation modeling by enhancing the directional attributes of speech signals. The researchers also investigate the use of neural beamforming in multi-channel source separation, showing that automatic learning of directional features significantly improves separation quality. The results demonstrate that directional source separation benefits Automatic Speech Recognition (ASR) for the wearer of the glasses, but not for the conversation partner. The study concludes with the joint training of the directional source separation and ASR model, achieving the best overall ASR performance.

 

Publication date: 21 Sep 2023
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2309.10993