🤖 AI Summary
This study investigates whether large language models (LLMs) possess the capacity to infer others’ knowledge states and intentions—a core component of theory of mind that distinguishes humans from chimpanzees. To this end, it introduces, for the first time, the knowledge-state tracking paradigm from cognitive anthropology into LLM evaluation, designing two story-comprehension tasks: detecting inconsistencies between a character’s actions and their subjective knowledge, and predicting subsequent behavior based on such inferences. Experimental results demonstrate that mainstream LLMs perform near chance levels on both tasks, significantly underperforming human participants. These findings reveal a profound deficit in LLMs’ foundational theory-of-mind capabilities and establish a novel evaluation paradigm and benchmark for assessing cognitive competencies in artificial systems.
📝 Abstract
Cognitive anthropology suggests that the distinction of human intelligence lies in the ability to infer other individuals'knowledge states and understand their intentions. In comparison, our closest animal relative, chimpanzees, lack the capacity to do so. With this paper, we aim to evaluate LLM performance in the area of knowledge state tracking and estimation. We design two tasks to test (1) if LLMs can detect when story characters, through their actions, demonstrate knowledge they should not possess, and (2) if LLMs can predict story characters'next actions based on their own knowledge vs. objective truths they do not know. Results reveal that most current state-of-the-art LLMs achieve near-random performance on both tasks, and are substantially inferior to humans. We argue future LLM research should place more weight on the abilities of knowledge estimation and intention understanding.