The research focuses on the match-mismatch classification with EEG recording using self-supervised speech representation and contextual text embedding. The researchers employed a deep convolutional network to extract spatiotemporal features from EEG data. Contrastive learning was used to relate EEG features to speech features. The study found that using self-supervised speech representation and contextual text embedding is beneficial. Through feature fusion and model ensemble, an accuracy of 60.29% was achieved, ranking second in the Auditory EEG Challenge.

 

Publication date: 11 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.04964