OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models
The paper introduces OpenToM, a new benchmark for evaluating the Neural Theory-of-Mind (N-ToM), the machine’s ability to understand and track the mental states of others. This benchmark addresses several shortcomings…
Continue reading