A Unified Representation Underlying the Judgment of Large Language Models

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the architectural nature of judgment capability in large language models (LLMs): whether judgment relies on dedicated modules or a unified, cross-domain resource. Method: Leveraging neural representational decoding, targeted interventions, and cross-model analysis, we examine evaluative judgments across diverse tasks. Contribution/Results: We identify a single dominant dimension—the Valence–Affirmation Axis (VAA)—that uniformly encodes subjective preference and factual affirmation across tasks, serving as a generative control signal. The VAA constitutes a shared, domain-general judgment mechanism. Critically, we provide the first empirical evidence of “reasoning subordination”: LLMs prioritize VAA consistency over factual accuracy, systematically inducing bias and hallucination. This work establishes the existence of a unified judgment representation in LLMs and furnishes an interpretable cognitive architecture grounding the fundamental tension between model rationality and reliability.

Technology Category

Application Category

📝 Abstract
A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource. While the discovery of decodable neural representations for distinct concepts in Large Language Models (LLMs) has suggested a modular architecture, whether these representations are truly independent systems remains an open question. Here we provide evidence for a convergent architecture. Across a range of LLMs, we find that diverse evaluative judgments are computed along a dominant dimension, which we term the Valence-Assent Axis (VAA). This axis jointly encodes subjective valence ("what is good") and the model's assent to factual claims ("what is true"). Through direct interventions, we show this unified representation creates a critical dependency: the VAA functions as a control signal that steers the generative process to construct a rationale consistent with its evaluative state, even at the cost of factual accuracy. This mechanism, which we term the subordination of reasoning, shifts the process of reasoning from impartial inference toward goal-directed justification. Our discovery offers a mechanistic account for systemic bias and hallucination, revealing how an architecture that promotes coherent judgment can systematically undermine faithful reasoning.
Problem

Research questions and friction points this paper is trying to address.

Investigates whether AI judgment uses specialized modules or unified representation
Identifies a dominant dimension encoding both valence and factual assent
Explains how unified representation causes reasoning bias and hallucinations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified representation combines valence and factual assent
Dominant Valence-Assent Axis steers generative process
Subordination of reasoning shifts inference toward justification
🔎 Similar Papers
No similar papers found.