This article discusses the creation of a large-scale, high-quality audio-language dataset called Auto-ACD. The dataset was created to address the limitations of existing audio-language datasets, such as insufficient volume and simplistic content. The article also introduces an innovative audio caption generation pipeline that uses public tools or APIs. The effectiveness of the Auto-ACD dataset is demonstrated by training popular models on it and showing performance improvement in various downstream tasks. The dataset, which includes over 1.9 million audio-text pairs, will be released on the project’s webpage.

 

Publication date: Not provided
Project Page: https://auto-acd.github.io/
Paper: https://arxiv.org/pdf/2309.11500