SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of large language models in reasoning tasks, where existing process supervision methods struggle to distinguish valid reasoning from redundant outputs, leading to excessive token consumption. The authors model reasoning as trajectories within an empirically solvable state space and introduce a stage-aware hierarchical advantage mechanism. At the paragraph level, state potential estimation guides efficient exploration by prioritizing low-potential regions; at the token level, entropy-driven reallocation sharpens execution signals. This approach uniquely integrates multi-granularity credit assignment with state potential evaluation, achieving an average 3% accuracy gain across three base models and five mathematical reasoning benchmarks while reducing token usage by 30%.
📝 Abstract
Process supervision has emerged as a promising approach for enhancing LLM reasoning, yet existing methods fail to distinguish meaningful progress from mere verbosity, leading to limited reasoning capabilities and unresolved token inefficiency. To address this, we propose Stage-aware Hierarchical Advantage via Potential Estimation (SHAPE), a framework that formalizes reasoning as a trajectory through a state space of empirical solvability. SHAPE introduces a hierarchical credit assignment mechanism: at the segment level, it employs a stage-aware advantage function to prioritize efficient breakthroughs in low-potential states; at the token level, it utilizes entropy-driven redistribution to sharpen execution signals. Extensive experiments in math reasoning across three base models and five benchmarks demonstrate that SHAPE achieves an average accuracy gain of 3% with 30% reduced token consumption.
Problem

Research questions and friction points this paper is trying to address.

process supervision
LLM reasoning
token inefficiency
reasoning progress
credit assignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

process supervision
hierarchical credit assignment
stage-aware advantage
potential estimation
token efficiency
🔎 Similar Papers
No similar papers found.
Z
Zhengyang Ai
Huawei Taylor Lab
Zikang Shan
Zikang Shan
Peking University
Reinforcement Learning
X
Xiaodong Ai
Huawei Taylor Lab
J
Jingxian Tang
Huawei Taylor Lab
H
Hangkai Hu
Huawei Taylor Lab
Pinyan Lu
Pinyan Lu
ITCS, Shanghai University of Finance and Economics
ComplexityAlgorithmGame Theory