SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References

The paper introduces SQuArE (Sentence-level QUestion AnsweRing Evaluation), a new metric for evaluating Question Answering (QA) systems. Current methods, such as human annotations, are expensive and challenging. Recent works have shown that similarity metrics based on transformer LM encoders transfer well for QA evaluation, but they are limited due to the usage of a single correct reference answer. SQuArE addresses this by using multiple reference answers, including correct and incorrect ones, improving the accuracy of predictions. SQuArE was evaluated on various QA systems and datasets, showing superior performance over previous methods.

Publication date: 21 Sep 2023
Project Page: https://arxiv.org/abs/2309.12250v1
Paper: https://arxiv.org/pdf/2309.12250

Post Views: 321

SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

On the Relationship between Skill Neurons and Robustness in Prompt Tuning

Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection

Leave a Reply Cancel reply

Please allow ads on our site