This academic article, titled ‘Uncovering Meanings of Embeddings via Partial Orthogonality’, delves into how machine learning tools encode the semantic structure of language in text embeddings. The authors specifically study a concept called ‘semantic independence’, suggesting that words like ‘eggplant’ and ‘tomato’ are independent given ‘vegetable’. The paper posits that partial orthogonality captures this semantic independence. It also introduces the concept of independence preserving embeddings, proving their existence and approximations. The paper provides evidence that the algebraic structure of these embeddings respects the axioms defining probabilistic conditional independence, thus encoding semantic independence.
Publication date: 26 Oct 2023
Project Page: https://arxiv.org/abs/2310.17611
Paper: https://arxiv.org/pdf/2310.17611