The authors of this paper delve into the reasoning capabilities of Large Language Models (LLMs), particularly GPT-4. Initially, there was optimism that reasoning might automatically emerge with scale, but this has been countered with examples where LLMs failed. The paper systematically investigates the effectiveness of iterative prompting of LLMs in the context of Graph Coloring, a canonical NP-complete reasoning problem. The study indicates that LLMs are ineffective in solving graph coloring instances and verifying a solution. The results challenge the claims about the self-critiquing capabilities of state-of-the-art LLMs.
Publication date: 19 Oct 2023
Project Page: https://arxiv.org/abs/2310.12397v1
Paper: https://arxiv.org/pdf/2310.12397