This research evaluates the reasoning ability of Large Language Models (LLMs), which have significantly impacted areas like natural language processing and software engineering. The study questions the depth of the LLMs’ reasoning ability, which, despite performing well on textual and numerical reasoning benchmarks, struggles with more complex problems requiring sequential decision-making and common-sense planning. The study uses the Inductive Logic Programming benchmark, a challenging measurement requiring strict cause-effect logic. The results suggest that LLMs, despite their size, fare poorly in terms of reasoning ability compared to smaller neural program induction systems.

 

Publication date: 18 Jan 2024
Project Page: https://doi.org/XXXXXXX.XXXXXXX
Paper: https://arxiv.org/pdf/2401.09042