Reflection-Window Decoding: Text Generation with Selective Refinement

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Autoregressive decoding in large language models (LLMs) lacks backtracking capability, leading to suboptimal sequences that deviate from the globally optimal joint probability distribution. To address this, we propose an uncertainty-aware selective refinement framework. Our method introduces token-level uncertainty estimation, integrated with a sliding reflective window and a dynamic pausing mechanism, enabling localized resampling and re-decoding during generation. The framework is plug-and-play—requiring no architectural modifications—and preserves high inference efficiency while approaching joint-probability optimality. Evaluated across diverse open-ended generation and reasoning benchmarks, it significantly improves factual consistency and fluency, yielding BLEU and ROUGE gains of 2.1–4.3 points, with inference latency overhead under 12%.

Technology Category

Application Category

📝 Abstract

The autoregressive decoding for text generation in large language models (LLMs), while widely used, is inherently suboptimal due to the lack of a built-in mechanism to perform refinement and/or correction of the generated content. In this paper, we consider optimality in terms of the joint probability over the generated response, when jointly considering all tokens at the same time. We theoretically characterize the potential deviation of the autoregressively generated response from its globally optimal counterpart that is of the same length. Our analysis suggests that we need to be cautious when noticeable uncertainty arises during text generation, which may signal the sub-optimality of the generation history. To address the pitfall of autoregressive decoding for text generation, we propose an approach that incorporates a sliding reflection window and a pausing criterion, such that refinement and generation can be carried out interchangeably as the decoding proceeds. Our selective refinement framework strikes a balance between efficiency and optimality, and our extensive experimental results demonstrate the effectiveness of our approach.

Problem

Research questions and friction points this paper is trying to address.

Improves text generation in LLMs

Addresses suboptimal autoregressive decoding

Introduces selective refinement mechanism

Innovation

Methods, ideas, or system contributions that make the work stand out.

sliding reflection window

pausing criterion

selective refinement framework

🔎 Similar Papers

No similar papers found.