GASS: Generalizing Audio Source Separation with Large-scale Data

The paper introduces a General Audio Source Separation (GASS) model trained with a large-scale dataset to separate different audio sources such as speech, music, and sound events. The GASS models show promising in-distribution results and competitive out-of-distribution performance. The study also explores the challenges in generalizing GASS models for separating out-of-distribution cinematic and music content. The fine-tuned GASS models consistently outperform those without pre-training, achieving state-of-the-art results in their respective benchmarks, except for music separation.

Publication date: 29 Sep 2023
Project Page: https://arxiv.org/abs/2310.00140v1
Paper: https://arxiv.org/pdf/2310.00140

Post Views: 291

GASS: Generalizing Audio Source Separation with Large-scale Data

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Fewer-token Neural Speech Codec with Time-invariant Codes

Leave a Reply Cancel reply

Please allow ads on our site