SPREG: Structured Plan Repair with Entropy-Guided Test-Time Intervention for Large Language Model Reasoning

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This work addresses the susceptibility of large language models to logical hallucinations and entropy drift during long-chain reasoning, where static guidance often leads to semantic dilution. The authors propose a lightweight, inference-time intervention framework that employs an adaptive dual-threshold mechanism to detect abrupt entropy surges in real time, thereby identifying logical errors. During structured reasoning stages, the method dynamically replaces the prior distribution by fusing historical high-confidence states to construct a reference distribution for precise correction. Integrating real-time entropy monitoring, dynamic null-prior replacement, and phase-aware guidance intensity modulation, the approach achieves an absolute accuracy improvement of 20.0% on the AIME25 benchmark, effectively mitigating uncontrolled entropy drift in complex reasoning tasks.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are prone to logical hallucinations and stochastic drifts during long-chain reasoning. While Classifier-Free Guidance (CFG) can improve instruction adherence, standard static implementations often cause semantic dilution and linguistic degradation. We propose SPREG (Structured Plan-guided Real-time Entropy Gating), a lightweight inference-time framework for surgical error rectification. SPREG employs an adaptive dual-threshold mechanism to monitor real-time entropy, identifying sudden ``entropy spikes'' as reliable indicators of logical failure. Upon detection, it triggers a dynamic repair by replacing uninformative null-priors with reference distributions synthesized from historical high-confidence states. By modulating guidance intensity according to structured reasoning stages (e.g., Action, Observation), SPREG steers the model back to a stable manifold without compromising fluency. Our experiments demonstrate significant gains, notably a 20.0% absolute accuracy improvement on AIME25, while effectively suppressing uncontrolled entropy drift in complex tasks.

Problem

Research questions and friction points this paper is trying to address.

logical hallucinations

stochastic drifts

long-chain reasoning

semantic dilution

linguistic degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

entropy-guided intervention

structured reasoning

test-time adaptation