Reflection-Window Decoding: Text Generation with Selective Refinement

📅 2025-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autoregressive decoding in large language models (LLMs) lacks backtracking capability, leading to suboptimal sequences that deviate from the globally optimal joint probability distribution. To address this, we propose an uncertainty-aware selective refinement framework. Our method introduces token-level uncertainty estimation, integrated with a sliding reflective window and a dynamic pausing mechanism, enabling localized resampling and re-decoding during generation. The framework is plug-and-play—requiring no architectural modifications—and preserves high inference efficiency while approaching joint-probability optimality. Evaluated across diverse open-ended generation and reasoning benchmarks, it significantly improves factual consistency and fluency, yielding BLEU and ROUGE gains of 2.1–4.3 points, with inference latency overhead under 12%.

Technology Category

Application Category

📝 Abstract
The autoregressive decoding for text generation in large language models (LLMs), while widely used, is inherently suboptimal due to the lack of a built-in mechanism to perform refinement and/or correction of the generated content. In this paper, we consider optimality in terms of the joint probability over the generated response, when jointly considering all tokens at the same time. We theoretically characterize the potential deviation of the autoregressively generated response from its globally optimal counterpart that is of the same length. Our analysis suggests that we need to be cautious when noticeable uncertainty arises during text generation, which may signal the sub-optimality of the generation history. To address the pitfall of autoregressive decoding for text generation, we propose an approach that incorporates a sliding reflection window and a pausing criterion, such that refinement and generation can be carried out interchangeably as the decoding proceeds. Our selective refinement framework strikes a balance between efficiency and optimality, and our extensive experimental results demonstrate the effectiveness of our approach.
Problem

Research questions and friction points this paper is trying to address.

Improves text generation in LLMs
Addresses suboptimal autoregressive decoding
Introduces selective refinement mechanism
Innovation

Methods, ideas, or system contributions that make the work stand out.

sliding reflection window
pausing criterion
selective refinement framework
🔎 Similar Papers
No similar papers found.
Zeyu Tang
Zeyu Tang
Postdoctoral Scholar, Stanford University
Trustworthy AICausalityComputational Justice
Zhenhao Chen
Zhenhao Chen
MBZUAI
CausalityMachine LearningRepresentation LearningLLMMultimodal AI
Loka Li
Loka Li
Mohamed bin Zayed University of Artificial Intelligence
Machine LearningCausality
Xiangchen Song
Xiangchen Song
Carnegie Mellon University
Machine LearningCausalityData Mining
Y
Yunlong Deng
Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence
Y
Yifan Shen
Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence
G
Guangyi Chen
Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence
Peter Spirtes
Peter Spirtes
Professor of Philosophy, Carnegie Mellon University
Machine LearningCausal Inference
K
Kun Zhang
Department of Philosophy, Carnegie Mellon University; Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence