OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

The paper introduces OpenToM, a new benchmark for evaluating the Neural Theory-of-Mind (N-ToM), the machine’s ability to understand and track the mental states of others. This benchmark addresses several shortcomings of existing N-ToM benchmarks, such as ambiguous narratives, lack of character personality traits and preferences, and limited question diversity. OpenToM uses longer, clearer narrative stories with explicit character traits and intentions, and questions designed to challenge Large Language Models’ (LLMs) capabilities of modeling characters’ mental states. The study reveals that while LLMs excel at modeling certain aspects of mental states in the physical world, they struggle when tracking characters’ mental states in the psychological world.

Publication date: 8 Feb 2024
Project Page: https://github.com/seacowx/OpenToM
Paper: https://arxiv.org/pdf/2402.06044

Post Views: 243

Press ESC to close

Share Article:

root

Limits of Large Language Models in Debating Humans

Doing Experiments and Revising Rules with Natural Language and Probabilistic Reasoning

Please allow ads on our site