CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks?

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Current evaluations of theory of mind (ToM) in large language models (LLMs) are largely confined to textual inputs and belief-related tasks, offering an incomplete assessment of their social cognition capabilities. To address this limitation, this work proposes CoMMET—the first multimodal ToM benchmark specifically designed for multi-turn dialogues, encompassing a diverse range of mental states and moral judgment tasks. Through the construction of multimodal stimuli, interactive dialogue scenarios, and systematic model evaluation, CoMMET reveals the boundaries of existing LLMs in complex social reasoning. The benchmark provides empirical evidence and actionable insights for advancing the social intelligence of artificial agents, highlighting both current shortcomings and promising directions for future development.

Technology Category

Application Category

📝 Abstract

Theory of Mind (ToM)-the ability to reason about the mental states of oneself and others-is a cornerstone of human social intelligence. As Large Language Models (LLMs) become ubiquitous in real-world applications, validating their capacity for this level of social reasoning is essential for effective and natural interactions. However, existing benchmarks for assessing ToM in LLMs are limited; most rely solely on text inputs and focus narrowly on belief-related tasks. In this paper, we propose a new multimodal benchmark dataset, CoMMET, a Comprehensive Mental states and Moral Evaluation Task inspired by the Theory of Mind Booklet Task. CoMMET expands the scope of evaluation by covering a broader range of mental states and introducing multi-turn testing. To the best of our knowledge, this is the first multimodal dataset to evaluate ToM in a multi-turn conversational setting. Through a comprehensive assessment of LLMs across different families and sizes, we analyze the strengths and limitations of current models and identify directions for future improvement. Our work offers a deeper understanding of the social cognitive capabilities of modern LLMs.

Problem

Research questions and friction points this paper is trying to address.

Theory of Mind

Large Language Models

multimodal benchmark

mental states

social reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal benchmark

Theory of Mind

multi-turn conversation

mental states

large language models

🔎 Similar Papers

No similar papers found.