uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

The article introduces a Unified Speech Enhancement and Editing (uSee) model that uses conditional diffusion models to perform various tasks simultaneously. The model can control the generation of unified speech enhancement and editing by providing multiple types of conditions including self-supervised learning embeddings and proper text prompts. The uSee model has shown superior performance in speech denoising and dereverberation compared to other generative speech enhancement models. It can also perform speech editing given desired environmental sound text description, signal-to-noise ratios, and room impulse responses.

Publication date: 4 Oct 2023
Project Page: https://muqiaoy.github.io/usee
Paper: https://arxiv.org/pdf/2310.00900

Post Views: 291

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation

F0 analysis of Ghanaian pop singing reveals progressive alignment with equal temperament over the past three decades: a case study

Leave a Reply Cancel reply

Please allow ads on our site