Social Genome: Grounded Social Reasoning Abilities of Multimodal Models

📅 2025-02-21

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenges of evaluating and enhancing fine-grained social reasoning capabilities of multimodal models in authentic social contexts. To this end, we introduce the first evidence-traceable social reasoning benchmark—comprising 272 annotated interaction videos and 1,486 human-annotated reasoning chains—that requires joint integration of visual, linguistic, and acoustic cues with external knowledge. We formally define and quantify “grounded” social reasoning for the first time; propose the first framework that explicitly incorporates external knowledge into both modeling and evaluation; and design a multidimensional evaluation metric balancing semantic correctness and structural coherence. Comprehensive assessment of state-of-the-art multimodal models reveals consistent deficiencies in evidence citation, knowledge integration, and reasoning coherence. Our benchmark provides a reproducible foundation for social intelligence research and actionable pathways for model improvement.

Technology Category

Application Category

📝 Abstract

Social reasoning abilities are crucial for AI systems to effectively interpret and respond to multimodal human communication and interaction within social contexts. We introduce Social Genome, the first benchmark for fine-grained, grounded social reasoning abilities of multimodal models. Social Genome contains 272 videos of interactions and 1,486 human-annotated reasoning traces related to inferences about these interactions. These traces contain 5,777 reasoning steps that reference evidence from visual cues, verbal cues, vocal cues, and external knowledge (contextual knowledge external to videos). Social Genome is also the first modeling challenge to study external knowledge in social reasoning. Social Genome computes metrics to holistically evaluate semantic and structural qualities of model-generated social reasoning traces. We demonstrate the utility of Social Genome through experiments with state-of-the-art models, identifying performance gaps and opportunities for future research to improve the grounded social reasoning abilities of multimodal models.

Problem

Research questions and friction points this paper is trying to address.

Develops benchmark for social reasoning

Evaluates multimodal AI interaction interpretation

Identifies gaps in social reasoning abilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal social reasoning benchmark

Annotated interaction reasoning traces

Holistic evaluation metrics introduced

🔎 Similar Papers

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

2024-08-22arXiv.orgCitations: 0

ByteDance

China / Singapore / United States

AI Research Scientist - FAIR Social Intelligence