Social Caption: Evaluating Social Understanding in Multimodal Models

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited capability of multimodal large language models (MLLMs) in understanding human social interactions and the absence of systematic evaluation frameworks. To bridge this gap, the authors propose Social Caption, a novel benchmark grounded in interaction theory, which introduces the first three-dimensional evaluation framework encompassing social reasoning, holistic social analysis, and targeted social analysis. Through structured tasks and tailored metrics, the framework rigorously assesses models’ social comprehension abilities. Experimental results demonstrate that both model scale and architecture significantly influence social cognition performance, underscoring the framework’s effectiveness and innovation in advancing automated evaluation of social understanding in artificial intelligence systems.

Technology Category

Application Category

📝 Abstract
Social understanding abilities are crucial for multimodal large language models (MLLMs) to interpret human social interactions. We introduce Social Caption, a framework grounded in interaction theory to evaluate social understanding abilities of MLLMs along three dimensions: Social Inference (SI), the ability to make accurate inferences about interactions; Holistic Social Analysis (HSA), the ability to generate comprehensive descriptions of interactions; Directed Social Analysis (DSA), the ability to extract relevant social information from interactions. We analyze factors influencing model performance in social understanding, such as scale, architectural design, and spoken context. Experiments with MLLM judges contribute insights about scaling automated evaluation of multimodal social understanding.
Problem

Research questions and friction points this paper is trying to address.

Social Understanding
Multimodal Models
Social Inference
Holistic Social Analysis
Directed Social Analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Social Caption
Multimodal Large Language Models
Social Understanding Evaluation
Interaction Theory
Automated Evaluation
🔎 Similar Papers
No similar papers found.