The study investigates the problem-solving capabilities of Large Language Models (LLMs) by examining their performance on stumpers – unique single-step intuition problems that pose challenges for human solvers but are easily verifiable. Four state-of-the-art LLMs (Davinci-2, Davinci-3, GPT-3.5-Turbo, GPT-4) are compared to human participants. The results show that new-generation LLMs excel in solving stumpers and surpass human performance, but humans exhibit superior skills in verifying solutions to the same problems. This research enhances our understanding of LLMs’ cognitive abilities and provides insights for enhancing their problem-solving potential across various domains.
Publication date: 25 Oct 2023
Project Page: https://github.com/Alon-Go/Stumpers-LLM
Paper: https://arxiv.org/pdf/2310.16411