September 25, 2023

Weakly-supervised Automated Audio Captioning via text only training

The article discusses a new approach to Automated Audio Captioning (AAC) that eliminates the need for paired audio-text data. This method uses a pre-trained Contrastive Language-Audio Pretraining (CLAP) model and text data only. It bridges the modality gap between audio and text embeddings, and it has shown up to 83% performance compared to fully supervised methods. This approach simplifies domain adaptation and mitigates the data scarcity issue in AAC.

Publication date: 25 Sep 2023
Project Page: https://github.com/zelaki/wsac
Paper: https://arxiv.org/pdf/2309.12242

Post Views: 326

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

Performance Conditioning for Diffusion-Based Multi-Instrument Music Synthesis

Towards Robust and Truly Large-Scale Audio-Sheet Music Retrieval

Please allow ads on our site