The paper introduces TDFNet, a model for audio-visual speech separation. This technology is significant for applications like speech recognition and assistive technologies. While existing methods demand more computational resources and parameters, TDFNet offers an efficient alternative. Building on the architecture of TDANet, TDFNet achieves a performance increase of up to 10% compared to the previous method, CTCNet, while using fewer parameters and only 28% of the multiply-accumulate operations. This makes TDFNet a highly effective solution to the challenges of speech separation within the audio-visual domain.

 

Publication date: 25 Jan 2024
Project Page: https://arxiv.org/abs/2401.14185
Paper: https://arxiv.org/pdf/2401.14185