The study by Geetanjali Rakshit and Jeffrey Flanigan investigates the robustness of Question Answering (QA) models on figurative text. The researchers propose FigurativeQA, a set of 1000 yes/no questions with figurative and non-figurative contexts, extracted from restaurant and product reviews. The study reveals that BERT-based QA models show an average performance drop of up to 15% points when answering questions from figurative contexts, compared to non-figurative ones. Models like GPT-3 and ChatGPT perform better with figurative texts, but their performance can be enhanced by automatically simplifying the figurative contexts into their non-figurative counterparts.
Publication date: 24 Sep 2023
Project Page: https://arxiv.org/abs/2309.13748v1
Paper: https://arxiv.org/pdf/2309.13748