The paper introduces a General Audio Source Separation (GASS) model trained with a large-scale dataset to separate different audio sources such as speech, music, and sound events. The GASS models show promising in-distribution results and competitive out-of-distribution performance. The study also explores the challenges in generalizing GASS models for separating out-of-distribution cinematic and music content. The fine-tuned GASS models consistently outperform those without pre-training, achieving state-of-the-art results in their respective benchmarks, except for music separation.
Publication date: 29 Sep 2023
Project Page: https://arxiv.org/abs/2310.00140v1
Paper: https://arxiv.org/pdf/2310.00140