Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the disconnect between token-level behavior and internal mechanisms in large reasoning models, as well as the instability in reinforcement learning training caused by reliance on external verifiers. The authors identify and formally define a novel phenomenon—“entropy-gradient inversion”—characterized by a strong negative correlation between token entropy and logit gradients, which they interpret as a geometric fingerprint of a model’s reasoning capability. Building on this insight, they propose CorR-PO, an algorithm that incorporates this intrinsic signal into reward regularization to stabilize reasoning optimization. Experiments across multiple model scales and reasoning benchmarks demonstrate that CorR-PO significantly outperforms existing methods, establishing a direct link between the strength of entropy-gradient inversion and reasoning performance, thereby transcending conventional reinforcement learning paradigms that depend on external supervision.

📝 Abstract

The advancement of Large Reasoning Models (LRMs) has catalyzed a paradigm shift from reactive ``fast thinking'' text generation to systematic, step-by-step ``slow thinking'' reasoning, unlocking state-of-the-art performance in complex mathematical and logical tasks. However, the field faces \textit{the fundamental gap between token-level behavioral analysis and internal reasoning mechanisms, and the instability of reinforcement learning (RL) for reasoning optimization relying on costly external verifiers}. We identify and formally define \textbf{Entropy-Gradient Inversion}, a robust negative correlation between token entropy and logit gradients that acts as a definitive geometric fingerprint for LRM reasoning capability. Building on this, we propose \textbf{Correlation-Regularized Group Policy Optimization (CorR-PO)}, which embeds this inversion signature into RL reward regularization. Extensive experiments on various reasoning benchmarks across multiple model scales show CorR-PO consistently outperforms state-of-the-art baselines, confirming that stronger inversion directly correlates with superior reasoning performance.

Problem

Research questions and friction points this paper is trying to address.

Large Reasoning Models

internal reasoning mechanisms

reinforcement learning instability

token-level behavioral analysis

external verifiers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-Gradient Inversion

Large Reasoning Models

Correlation-Regularized RL