This article focuses on the issue of ‘hubness’ in Sentence-BERT, a problem where some texts become ‘neighbours’ to many others while most have few or no neighbours. This impacts the effectiveness of semantic representations of text, particularly in high-dimensional data. The authors propose hubness reduction methods that they find can reduce error rates and hubness significantly, thereby improving the semantic representations of text. The study concludes that mitigating hubness improves the quality of semantic spaces.

 

Publication date: 1 Dec 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2311.18364