The article discusses the development of a new model for speech enhancement called SICRN. The model uses a state space model and inplace convolution techniques to enhance speech quality, especially in noisy environments. The state space model is used to capture global frequencies and long-term temporal dependencies, while the 2D-inplace convolution is used to capture local structures. SICRN outperforms traditional models in terms of model parameters, computations, and algorithmic delay, showing promise for improved speech enhancement.

 

Publication date: 23 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.14225