The paper introduces OpenToM, a new benchmark for evaluating the Neural Theory-of-Mind (N-ToM), the machine’s ability to understand and track the mental states of others. This benchmark addresses several shortcomings of existing N-ToM benchmarks, such as ambiguous narratives, lack of character personality traits and preferences, and limited question diversity. OpenToM uses longer, clearer narrative stories with explicit character traits and intentions, and questions designed to challenge Large Language Models’ (LLMs) capabilities of modeling characters’ mental states. The study reveals that while LLMs excel at modeling certain aspects of mental states in the physical world, they struggle when tracking characters’ mental states in the psychological world.
Publication date: 8 Feb 2024
Project Page: https://github.com/seacowx/OpenToM
Paper: https://arxiv.org/pdf/2402.06044