Self-Reflective Generation at Test Time

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) rely on long-chain reasoning for complex tasks, yet their autoregressive, left-to-right generation is vulnerable to error propagation from early incorrect tokens; existing self-reflection methods—such as full-draft revision or costly fine-tuning—are passive and inefficient. This paper introduces SRGen, a lightweight, test-time framework enabling fine-grained, on-the-fly self-reflection during generation: it dynamically identifies high-uncertainty tokens via entropy-based thresholds and performs localized probability distribution correction conditioned on already-generated context—requiring neither token re-generation nor additional training. SRGen is plug-and-play, computationally efficient, and agnostic to underlying training or inference strategies. Evaluated on mathematical reasoning benchmarks including AIME2024 with DeepSeek-R1-Distill-Qwen-7B, SRGen achieves absolute improvements of +12.0% in Pass@1 and +13.3% in Cons@5, substantially enhancing both single-sample output quality and self-consistency.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) increasingly solve complex reasoning tasks via long chain-of-thought, but their forward-only autoregressive generation process is fragile; early token errors can cascade, which creates a clear need for self-reflection mechanisms. However, existing self-reflection either performs revisions over full drafts or learns self-correction via expensive training, both fundamentally reactive and inefficient. To address this, we propose Self-Reflective Generation at Test Time (SRGen), a lightweight test-time framework that reflects before generating at uncertain points. During token generation, SRGen utilizes dynamic entropy thresholding to identify high-uncertainty tokens. For each identified token, it trains a specific corrective vector, which fully exploits the already generated context for a self-reflective generation to correct the token probability distribution. By retrospectively analyzing the partial output, this self-reflection enables more trustworthy decisions, thereby significantly reducing the probability of errors at highly uncertain points. Evaluated on challenging mathematical reasoning benchmarks and a diverse set of LLMs, SRGen can consistently strengthen model reasoning: improvements in single-pass quality also translate into stronger self-consistency voting. Especially, on AIME2024 with DeepSeek-R1-Distill-Qwen-7B, SRGen yields absolute improvements of +12.0% on Pass@1 and +13.3% on Cons@5. Moreover, our findings position SRGen as a plug-and-play method that integrates reflection into the generation process for reliable LLM reasoning, achieving consistent gains with bounded overhead and broad composability with other training-time (e.g., RLHF) and test-time (e.g., SLOT) techniques.

Problem

Research questions and friction points this paper is trying to address.

Addresses error cascades in autoregressive LLM generation

Proposes lightweight test-time self-reflection without full revisions

Improves reasoning accuracy by correcting uncertain token distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic entropy thresholding identifies uncertain tokens

Corrective vectors adjust token probability distributions

Plug-and-play reflection integrates into generation process

🔎 Similar Papers

No similar papers found.

Authors to Follow