EnigmaToM: Improve LLMs' Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) suffer from low efficiency and poor generalization in higher-order theory of mind (ToM) reasoning—particularly in multi-hop belief tracking and perspective shifting—and rely excessively on purely neural computation. To address this, we propose a psychology-inspired neurosymbolic framework: (1) an iterative perspective masking mechanism for dynamic perceptual switching; (2) an Entity-state Neural Knowledge Base (Enigma) for structured belief representation; and (3) spatial scene graphs as a structural inductive bias to support arbitrary-order ToM modeling. By integrating symbolic interpretability with neural generalization, our method significantly improves higher-order ToM accuracy on benchmarks including ToMi, HiToM, and FANToM—especially for 7B–13B parameter models, where it substantially outperforms prior approaches. This work establishes a new paradigm for scalable, interpretable ToM reasoning.

Technology Category

Application Category

📝 Abstract
Theory-of-Mind (ToM), the ability to infer others' perceptions and mental states, is fundamental to human interaction but remains a challenging task for Large Language Models (LLMs). While existing ToM reasoning methods show promise with reasoning via perceptual perspective-taking, they often rely excessively on LLMs, reducing their efficiency and limiting their applicability to high-order ToM reasoning, which requires multi-hop reasoning about characters' beliefs. To address these issues, we present EnigmaToM, a novel neuro-symbolic framework that enhances ToM reasoning by integrating a Neural Knowledge Base of entity states (Enigma) for (1) a psychology-inspired iterative masking mechanism that facilitates accurate perspective-taking and (2) knowledge injection that elicits key entity information. Enigma generates structured representations of entity states, which construct spatial scene graphs -- leveraging spatial information as an inductive bias -- for belief tracking of various ToM orders and enhancing events with fine-grained entity state details. Experimental results on multiple benchmarks, including ToMi, HiToM, and FANToM, show that EnigmaToM significantly improves ToM reasoning across LLMs of varying sizes, particularly excelling in high-order reasoning scenarios.
Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs' Theory-of-Mind reasoning capabilities
Address inefficiency in high-order ToM reasoning
Integrate Neural Knowledge Base for entity states
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Knowledge Base for entity states
Psychology-inspired iterative masking mechanism
Spatial scene graphs for belief tracking
🔎 Similar Papers
No similar papers found.