The article presents a new deep complex hybrid transformer for speech enhancement, which combines approaches from both the spectrogram and waveform domains. The model, comprised of a complex Swin-Unet in the spectrogram domain and a dual-path transformer network in the waveform domain, learns multi-domain features to reduce noise. It shows improved performance on the BirdSoundsDenoising and VCTK+DEMAND datasets. The study suggests this hybrid approach can enhance the quality and intelligibility of speech.

 

Publication date: 3 Nov 2023
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2310.19602