This research focuses on understanding the information encoded in speech processing by using vector representations of speech from a pretrained model. The study proposes an unsupervised method using ABX tests on audio recordings to determine whether the representations computed by a multilingual speech model encode a given characteristic. The method was tested on room acoustics aspects, linguistic genre, and phonetic aspects. The findings suggest that ABX tests can bring out differences in the acoustic setup, voice properties, or linguistic content. The study offers a new approach to detect factors in corpora intended for unsupervised learning and a means to classify recordings where such metadata are unavailable.
Publication date: 9 Feb 2024
Project Page: Unknown
Paper: https://arxiv.org/pdf/2402.05581