ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions

📅 2025-09-05

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing Theory of Mind (ToM) benchmarks predominantly rely on the Sally-Anne paradigm and are confined to unimodal, single-turn textual dialogues, failing to capture complex, multimodal, multi-agent social interactions. Method: We introduce ToM-SSI—the first embodied, multi-agent ToM evaluation benchmark designed for situated social scenarios—featuring hybrid cooperative-competitive settings and parallel multi-agent mental state inference tasks. It supports joint visual-language-spatial modeling and trajectory-language co-reasoning. Contribution/Results: Empirical evaluation reveals that state-of-the-art large language and multimodal models exhibit significant deficiencies in multi-agent collaborative reasoning and in scenarios involving cooperation breakdown, exposing critical limitations in current ToM modeling. ToM-SSI establishes a novel, systematic paradigm for assessing embodied social cognition and provides a scalable, extensible testbed for advancing ToM capabilities in interactive, real-world environments.

Technology Category

Application Category

📝 Abstract

Most existing Theory of Mind (ToM) benchmarks for foundation models rely on variations of the Sally-Anne test, offering only a very limited perspective on ToM and neglecting the complexity of human social interactions. To address this gap, we propose ToM-SSI: a new benchmark specifically designed to test ToM capabilities in environments rich with social interactions and spatial dynamics. While current ToM benchmarks are limited to text-only or dyadic interactions, ToM-SSI is multimodal and includes group interactions of up to four agents that communicate and move in situated environments. This unique design allows us to study, for the first time, mixed cooperative-obstructive settings and reasoning about multiple agents' mental state in parallel, thus capturing a wider range of social cognition than existing benchmarks. Our evaluations reveal that the current models' performance is still severely limited, especially in these new tasks, highlighting critical gaps for future research.

Problem

Research questions and friction points this paper is trying to address.

Evaluating Theory of Mind in complex social interactions

Testing multimodal group interactions with spatial dynamics

Assessing mental state reasoning for multiple parallel agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal benchmark with group interactions

Evaluates mixed cooperative-obstructive social settings

Assesses parallel mental state reasoning for multiple agents

🔎 Similar Papers

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

2024-08-22arXiv.orgCitations: 0

Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective

2024-10-08Citations: 0