Cognition-of-Thought Elicits Social-Aligned Reasoning in Large Language Models

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) rely predominantly on implicit, static weight-based safety alignment, which hinders auditability and dynamic adaptation. To address this, we propose Cognition-of-Thought (CooT), a decoding-phase framework that establishes a generator–perceiver collaborative cognitive self-monitoring loop to enable explicit, dynamic sociocognitive alignment during inference. Our approach shifts alignment from the parameter level to the reasoning level, supporting policy updates without fine-tuning. Key innovations include hierarchical value-principle evaluation, generation rollback mechanisms, and guided injection that integrates social priors with context-aware warnings. We further introduce a priority-based cognitive intervention framework for targeted alignment control. Experiments demonstrate that CooT achieves substantial improvements across multiple safety and social reasoning benchmarks, exhibiting strong generalization, real-time adaptability, and enhanced interpretability and auditability—without modifying model weights.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) excel at complex reasoning but can still exhibit harmful behaviors. Current alignment strategies typically embed safety into model weights, making these controls implicit, static, and difficult to modify. This paper introduces Cognition-of-Thought (CooT), a novel decoding-time framework that equips LLMs with an explicit cognitive self-monitoring loop. CooT couples a standard text Generator with a cognitive Perceiver that continuously monitors the unfolding sequence. The Perceiver uses a structured, precedence-based hierarchy of principles (e.g., safety over obedience) to detect potential misalignments as they arise. When violations are flagged, CooT intervenes by rolling back the generation to the point of error and regenerating under injected guidance that combines universal social priors with context-specific warnings. CooT thus transforms alignment from a fixed property into an explicit, dynamic, and auditable process active during inference, allowing for flexible policy updates without retraining the model. Extensive experiments across multiple benchmarks and model families confirm that CooT consistently improves safety and social reasoning performance.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM safety through explicit cognitive monitoring during generation
Replacing static alignment with dynamic auditable reasoning processes
Preventing harmful behaviors using structured social principle hierarchies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoding-time framework with cognitive self-monitoring loop
Structured hierarchy detects misalignments during generation
Rolls back generation and regenerates with injected guidance
X
Xuanming Zhang
University of Wisconsin-Madison
Y
Yuxuan Chen
Tsinghua University
Min-Hsuan Yeh
Min-Hsuan Yeh
University of Wisconsin Madison
Natural Language Processing
Y
Yixuan Li
University of Wisconsin-Madison