The study presents a dataset for testing the understanding of the rationale behind critical reasoning. Results show that recent large language models struggle to answer subquestions even if they can answer the main questions correctly. The models perform poorly in answering subquestions written for the incorrect options of the main questions, indicating their limited capability for explaining why incorrect alternatives should be eliminated. The dataset encourages further investigation into the critical reasoning ability of language models while focusing on the elimination process of relevant alternatives.

 

Publication date: 1 Dec 2023
Project Page: Not given
Paper: https://arxiv.org/pdf/2311.18353