February 9, 2024

Multilingual E5 Text Embeddings: A Technical Report

The technical report discusses the multilingual E5 text embedding models released by Microsoft. The models are trained using contrastive pre-training on a billion multilingual text pairs and then fine-tuned with labeled datasets. The report introduces an instruction-tuned embedding model that performs on par with English-only models. The models were evaluated on the MTEB benchmark and the MIRACL multilingual retrieval benchmark, showing competitive performance.

Publication date: 8 Feb 2024
Project Page: https://github.com/microsoft/unilm/tree/master/e5
Paper: https://arxiv.org/pdf/2402.05672

Post Views: 271

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations

Please allow ads on our site