Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations

📅 2025-07-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision-language models (VLMs) struggle with joint perception, understanding, and normative judgment in social contexts. Method: This paper proposes a cognition-inspired, three-stage multimodal reasoning framework: (1) a perception layer extracting visual-semantic features; (2) a situational layer modeling social relationships and contextual dynamics; and (3) a normative layer integrating ethical and sociocultural principles for value-laden judgment. Unlike conventional chain-of-thought (CoT) prompting—which fails on this task—the framework employs a hierarchical prompting strategy that explicitly decouples and coordinates reasoning across stages. Contribution/Results: Evaluated on multiple social-perception multimodal benchmarks, the framework achieves an average 8% performance gain over strong baselines, significantly outperforming standard CoT and direct prompting. It also enhances interpretability and sociocultural plausibility of model reasoning without compromising accuracy.

Technology Category

Application Category

📝 Abstract
Chain-of-Thought (CoT) prompting helps models think step by step. But what happens when they must see, understand, and judge-all at once? In visual tasks grounded in social context, where bridging perception with norm-grounded judgments is essential, flat CoT often breaks down. We introduce Cognitive Chain-of-Thought (CoCoT), a prompting strategy that scaffolds VLM reasoning through three cognitively inspired stages: perception, situation, and norm. Our experiments show that, across multiple multimodal benchmarks (including intent disambiguation, commonsense reasoning, and safety), CoCoT consistently outperforms CoT and direct prompting (+8% on average). Our findings demonstrate that cognitively grounded reasoning stages enhance interpretability and social awareness in VLMs, paving the way for safer and more reliable multimodal systems.
Problem

Research questions and friction points this paper is trying to address.

Enhance multimodal reasoning in social contexts
Bridge perception with norm-grounded judgments
Improve interpretability and social awareness in VLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured multimodal reasoning with cognitive stages
Three-stage prompting: perception, situation, norm
Enhances VLM interpretability and social awareness
🔎 Similar Papers
No similar papers found.