Unleashing the True Potential of LLMs: A Feedback-Triggered Self-Correction with Long-Term Multipath Decoding

📅 2025-09-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently generate erroneous content during reasoning, and existing self-correction approaches suffer from unreliable error localization signals and myopic inference caused by autoregressive, token-by-token decoding. To address these limitations, we propose Feedback-Triggered Regeneration with Long-Term Multi-path decoding (FTR-LTM), a novel self-correction framework featuring two core innovations: (1) a feedback-triggered regeneration mechanism that leverages external negative feedback—rather than error-prone internal self-assessment—to halt error propagation; and (2) long-term multi-path decoding, which defers sequence-level evaluation to enable parallel exploration of multiple reasoning paths, thereby overcoming the constraints of conventional single-step prediction. Evaluated on mathematical reasoning and code generation benchmarks, FTR-LTM consistently outperforms state-of-the-art prompt-based self-correction methods, delivering robust and sustained performance gains.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have achieved remarkable performance across diverse tasks, yet their susceptibility to generating incorrect content during inference remains a critical unsolved challenge. While self-correction methods offer potential solutions, their effectiveness is hindered by two inherent limitations: (1) the absence of reliable guidance signals for error localization, and (2) the restricted reasoning depth imposed by conventional next-token decoding paradigms. To address these issues, we propose Feedback-Triggered Regeneration (FTR), a novel framework that synergizes user feedback with enhanced decoding dynamics. Specifically, FTR activates response regeneration only upon receiving negative user feedback, thereby circumventing error propagation from faulty self-assessment while preserving originally correct outputs. Furthermore, we introduce Long-Term Multipath (LTM) decoding, which enables systematic exploration of multiple reasoning trajectories through delayed sequence evaluation, effectively overcoming the myopic decision-making characteristic of standard next-token prediction. Extensive experiments on mathematical reasoning and code generation benchmarks demonstrate that our framework achieves consistent and significant improvements over state-of-the-art prompt-based self-correction methods.
Problem

Research questions and friction points this paper is trying to address.

LLMs generate incorrect content without reliable error guidance
Self-correction limited by shallow next-token decoding paradigms
Need systematic exploration of multiple reasoning trajectories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feedback-triggered regeneration for error correction
Long-term multipath decoding for reasoning depth
User feedback integration to guide regeneration
🔎 Similar Papers
No similar papers found.
J
Jipeng Li
Tencent
Z
Zeyu Gao
Tsinghua University
Y
Yubin Qi
Tencent
Hande Dong
Hande Dong
Tencent
machine learningdata miningNLP
W
Weijian Chen
Institute of Dataspace, Hefei Comprehensive National Science Center
Qiang Lin
Qiang Lin
University of Rochester
Nonlinear PhotonicsQuantum PhotonicsMechanical Photonics