CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks?

πŸ“… 2026-03-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current evaluations of theory of mind (ToM) in large language models (LLMs) are largely confined to textual inputs and belief-related tasks, offering an incomplete assessment of their social cognition capabilities. To address this limitation, this work proposes CoMMETβ€”the first multimodal ToM benchmark specifically designed for multi-turn dialogues, encompassing a diverse range of mental states and moral judgment tasks. Through the construction of multimodal stimuli, interactive dialogue scenarios, and systematic model evaluation, CoMMET reveals the boundaries of existing LLMs in complex social reasoning. The benchmark provides empirical evidence and actionable insights for advancing the social intelligence of artificial agents, highlighting both current shortcomings and promising directions for future development.

Technology Category

Application Category

πŸ“ Abstract
Theory of Mind (ToM)-the ability to reason about the mental states of oneself and others-is a cornerstone of human social intelligence. As Large Language Models (LLMs) become ubiquitous in real-world applications, validating their capacity for this level of social reasoning is essential for effective and natural interactions. However, existing benchmarks for assessing ToM in LLMs are limited; most rely solely on text inputs and focus narrowly on belief-related tasks. In this paper, we propose a new multimodal benchmark dataset, CoMMET, a Comprehensive Mental states and Moral Evaluation Task inspired by the Theory of Mind Booklet Task. CoMMET expands the scope of evaluation by covering a broader range of mental states and introducing multi-turn testing. To the best of our knowledge, this is the first multimodal dataset to evaluate ToM in a multi-turn conversational setting. Through a comprehensive assessment of LLMs across different families and sizes, we analyze the strengths and limitations of current models and identify directions for future improvement. Our work offers a deeper understanding of the social cognitive capabilities of modern LLMs.
Problem

Research questions and friction points this paper is trying to address.

Theory of Mind
Large Language Models
multimodal benchmark
mental states
social reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal benchmark
Theory of Mind
multi-turn conversation
mental states
large language models
πŸ”Ž Similar Papers
No similar papers found.
R
Ruirui Chen
Institute of High Performance Computing (IHPC) and Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), Singapore
W
Weifeng Jiang
Nanyang Technological University, Singapore
Chengwei Qin
Chengwei Qin
HKUST(GZ), NTU
LLMNLP
Cheston Tan
Cheston Tan
Institute for Infocomm Research; Centre for Frontier AI Research
Cognitively-Inspired AIEmbodied AIAGIHuman-Centric SystemsAssistive AI