Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework

📅 2025-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM inference methods suffer from limitations: sequential decoding relies on fixed token budgets, leading to premature termination or inefficiency; parallel decoding lacks inter-branch coordination and typically requires fine-tuning. This paper proposes a semantic entropy-guided multi-round parallel adaptive termination framework. For the first time, it introduces semantic entropy as an unsupervised, fine-tuning-free intrinsic quality metric—revealing a negative correlation between semantic diversity among parallel responses and overall accuracy. By dynamically evaluating semantic entropy across inference rounds, the method enables coordinated path pruning and optimal termination timing. Integrating the depth of sequential reasoning with the breadth of parallel exploration, it significantly improves accuracy and computational efficiency on complex tasks—including mathematical reasoning and commonsense QA—while reducing redundant generation, enhancing inference stability, and improving cross-task generalization.

Technology Category

Application Category

📝 Abstract
Recent advances in large language models (LLMs) have accelerated progress toward artificial general intelligence, with inference-time scaling emerging as a key technique. Contemporary approaches leverage either sequential reasoning (iteratively extending chains of thought) or parallel reasoning (generating multiple solutions simultaneously) to scale inference. However, both paradigms face fundamental limitations: sequential scaling typically relies on arbitrary token budgets for termination, leading to inefficiency or premature cutoff; while parallel scaling often lacks coordination among parallel branches and requires intrusive fine-tuning to perform effectively. In light of these challenges, we aim to design a flexible test-time collaborative inference framework that exploits the complementary strengths of both sequential and parallel reasoning paradigms. Towards this goal, the core challenge lies in developing an efficient and accurate intrinsic quality metric to assess model responses during collaborative inference, enabling dynamic control and early termination of the reasoning trace. To address this challenge, we introduce semantic entropy (SE), which quantifies the semantic diversity of parallel model responses and serves as a robust indicator of reasoning quality due to its strong negative correlation with accuracy...
Problem

Research questions and friction points this paper is trying to address.

Optimize termination in multi-round parallel reasoning for efficiency
Balance sequential and parallel reasoning strengths in inference frameworks
Develop semantic entropy metric to assess reasoning quality dynamically
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic entropy guides adaptive termination
Combines sequential and parallel reasoning strengths
Dynamic control via semantic diversity metric
🔎 Similar Papers
No similar papers found.
Zenan Xu
Zenan Xu
Sun Yat-sen University
Zexuan Qiu
Zexuan Qiu
The Chinese University of Hong Kong
Natural Language Processing
G
Guanhua Huang
LLM Department, Hunyuan T1 Team, Tencent
K
Kun Li
LLM Department, Hunyuan T1 Team, Tencent; The Chinese University of Hong Kong
S
Siheng Li
LLM Department, Hunyuan T1 Team, Tencent; The Chinese University of Hong Kong
C
Chenchen Zhang
LLM Department, Hunyuan T1 Team, Tencent
K
Kejiao Li
LLM Department, Hunyuan T1 Team, Tencent
Q
Qi Yi
LLM Department, Hunyuan T1 Team, Tencent
Yuhao Jiang
Yuhao Jiang
Postdoc Researcher, EPFL
Soft RoboticsMechanism DesignDynamic ModelingControls
B
Bo Zhou
LLM Department, Hunyuan T1 Team, Tencent
F
Fengzong Lian
LLM Department, Hunyuan T1 Team, Tencent
Z
Zhanhui Kang
LLM Department, Hunyuan T1 Team, Tencent