🤖 AI Summary
Current large language models (LLMs) exhibit limited Theory of Mind (ToM) capabilities—particularly in scenarios lacking explicit world-state changes, such as dialogue and narrative understanding—due to insufficient generalization in inferring others’ mental states. To address this, we propose Shoes-of-Others (SoO), a lightweight, inference-time prefixing method that requires no world-modeling assumptions. SoO injects role-taking prompts (e.g., “Let’s put ourselves in A’s shoes.”) to elicit faithful, context-aware reasoning about agents’ beliefs, intentions, desires, emotions, and knowledge. Grounded in prompt engineering, SoO is fully plug-and-play and domain-agnostic. Evaluated on two comprehensive ToM benchmarks, SoO consistently improves accuracy across all five mental-state categories without degrading general language capabilities. To our knowledge, SoO is the first approach to achieve cross-scenario, low-overhead, and highly compatible ToM enhancement for LLMs.
📝 Abstract
Recent studies have shown that Theory of Mind (ToM) in large language models (LLMs) has not reached human-level performance yet. Since fine-tuning LLMs on ToM datasets often degrades their generalization, several inference-time methods have been proposed to enhance ToM in LLMs. However, existing inference-time methods for ToM are specialized for inferring beliefs from contexts involving changes in the world state. In this study, we present a new inference-time method for ToM, Shoes-of-Others (SoO) prefixing, which makes fewer assumptions about contexts and is applicable to broader scenarios. SoO prefixing simply specifies the beginning of LLM outputs with ``Let's put ourselves in A's shoes.'', where A denotes the target character's name. We evaluate SoO prefixing on two benchmarks that assess ToM in conversational and narrative contexts without changes in the world state and find that it consistently improves ToM across five categories of mental states. Our analysis suggests that SoO prefixing elicits faithful thoughts, thereby improving the ToM performance.