XToM: Exploring the Multilingual Theory of Mind for Large Language Models

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing Theory of Mind (ToM) evaluation frameworks focus predominantly on English, overlooking how linguistic diversity affects mental-state reasoning. Method: We introduce “Multilingual Theory of Mind” (Multilingual ToM) and present XToM—the first cross-lingual ToM benchmark—covering Chinese, English, French, Spanish, and Japanese across diverse cognitive scenarios. Task templates are grounded in cognitive science principles and constructed via human annotation, expert validation, and multilingual prompt engineering; evaluation follows zero-shot and few-shot protocols. Contribution/Results: Experiments reveal that state-of-the-art large language models (e.g., DeepSeek R1), despite strong multilingual comprehension, exhibit significant cross-lingual variance in ToM performance—indicating a lack of human-level multilingual generalization in mental-state inference. This work provides the first empirical evidence of ToM’s linguistic dependence and establishes both a novel benchmark and a theoretical foundation for multilingual cognitive modeling.

Technology Category

Application Category

📝 Abstract

Theory of Mind (ToM), the ability to infer mental states in others, is pivotal for human social cognition. Existing evaluations of ToM in LLMs are largely limited to English, neglecting the linguistic diversity that shapes human cognition. This limitation raises a critical question: can LLMs exhibit Multilingual Theory of Mind, which is the capacity to reason about mental states across diverse linguistic contexts? To address this gap, we present XToM, a rigorously validated multilingual benchmark that evaluates ToM across five languages and incorporates diverse, contextually rich task scenarios. Using XToM, we systematically evaluate LLMs (e.g., DeepSeek R1), revealing a pronounced dissonance: while models excel in multilingual language understanding, their ToM performance varies across languages. Our findings expose limitations in LLMs' ability to replicate human-like mentalizing across linguistic contexts.

Problem

Research questions and friction points this paper is trying to address.

Evaluating multilingual Theory of Mind in large language models

Assessing ToM performance across diverse linguistic contexts

Exploring limitations in LLMs' human-like mentalizing abilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual benchmark for Theory of Mind

Evaluates ToM across five languages

Contextually rich task scenarios included

🔎 Similar Papers

No similar papers found.