The paper by Shreyan Chowdhury and Gerhard Widmer presents a study on expressivity-aware music performance retrieval. The authors focus on retrieving a specific rendition of a music piece based on its style, expressive character, or emotion from a set of different performances. Traditional text-audio embedding systems were found to be sub-optimal for this task. The authors proposed improvements using emotion-enriched word embeddings (EWE) for text and mid-level perceptual features for audio. Their findings highlight the effectiveness of these features in capturing musical expression in a cross-modal setting and provide a route for introducing explainability in the retrieval and recommendation processes.

 

Publication date: 31 Jan 2024
Project Page: https://doi.org/10.1145/3632754.3632761
Paper: https://arxiv.org/pdf/2401.14826