This academic article focuses on the evaluation of generative models of expressive piano performance. These models are usually assessed by comparing their predictions to a human performance. The study presents experiments using high-quality performances of classical piano music and a listening test. The results indicate that listeners can sometimes perceive subtle performance differences that go unnoticed under quantitative evaluation. The authors discuss the implications of these findings for quantitative evaluation and hope to foster a critical appreciation of the uncertainties involved in such assessments within the music information retrieval community.

 

Publication date: 2023-11-10
Project Page: https://doi.org/10.1145/3625135.3625141
Paper: https://arxiv.org/pdf/2401.00471