Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

📅 2026-05-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This work addresses the disconnect between token-level behavior and internal mechanisms in large reasoning models, as well as the instability in reinforcement learning training caused by reliance on external verifiers. The authors identify and formally define a novel phenomenon—“entropy-gradient inversion”—characterized by a strong negative correlation between token entropy and logit gradients, which they interpret as a geometric fingerprint of a model’s reasoning capability. Building on this insight, they propose CorR-PO, an algorithm that incorporates this intrinsic signal into reward regularization to stabilize reasoning optimization. Experiments across multiple model scales and reasoning benchmarks demonstrate that CorR-PO significantly outperforms existing methods, establishing a direct link between the strength of entropy-gradient inversion and reasoning performance, thereby transcending conventional reinforcement learning paradigms that depend on external supervision.
📝 Abstract
The advancement of Large Reasoning Models (LRMs) has catalyzed a paradigm shift from reactive ``fast thinking'' text generation to systematic, step-by-step ``slow thinking'' reasoning, unlocking state-of-the-art performance in complex mathematical and logical tasks. However, the field faces \textit{the fundamental gap between token-level behavioral analysis and internal reasoning mechanisms, and the instability of reinforcement learning (RL) for reasoning optimization relying on costly external verifiers}. We identify and formally define \textbf{Entropy-Gradient Inversion}, a robust negative correlation between token entropy and logit gradients that acts as a definitive geometric fingerprint for LRM reasoning capability. Building on this, we propose \textbf{Correlation-Regularized Group Policy Optimization (CorR-PO)}, which embeds this inversion signature into RL reward regularization. Extensive experiments on various reasoning benchmarks across multiple model scales show CorR-PO consistently outperforms state-of-the-art baselines, confirming that stronger inversion directly correlates with superior reasoning performance.
Problem

Research questions and friction points this paper is trying to address.

Large Reasoning Models
internal reasoning mechanisms
reinforcement learning instability
token-level behavioral analysis
external verifiers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-Gradient Inversion
Large Reasoning Models
Correlation-Regularized RL
Reasoning Mechanism
Token Entropy