The study aims to understand how Large Language Models (LLMs) utilize source and reference information in evaluating machine translations. The researchers designed controlled experiments across various input modes and model types. They found that reference information significantly enhances evaluation accuracy, while source information can sometimes be counterproductive. This suggests a lack of cross-lingual capability when using LLMs to evaluate translations. The findings also suggest a potential research direction that fully exploits the cross-lingual capability of LLMs for better performance in machine translation evaluation tasks.

 

Publication date: 15 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.06568