Learning Audio Concepts from Counterfactual Natural Language

The article discusses the limitations of conventional audio classification methods and introduces a novel method that incorporates counterfactual analysis. The proposed model considers acoustic characteristics and sound source information from human-annotated reference texts. It includes counterfactual instances to train models for recognizing sound events and sources in alternative scenarios. The effectiveness of this method was validated via pre-training utilizing multiple audio captioning datasets, and evaluated with several common downstream tasks. The results showed a significant improvement in the top-1 accuracy in open-ended language-based audio retrieval tasks.

Publication date: 11 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.04935

Post Views: 295

Learning Audio Concepts from Counterfactual Natural Language

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording

Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection

Leave a Reply Cancel reply

Please allow ads on our site