Corrector Sampling in Language Models

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

150K/year

🤖 AI Summary

Autoregressive language models suffer from error accumulation due to their unidirectional generation mechanism. To address this, we propose Resample-Previous-Tokens (RPT), the first plug-and-play local resampling method integrated into standard autoregressive decoding—without modifying the model architecture. RPT iteratively backtracks and resamples previously generated tokens within a sliding window, enabling inference-time correction in a zero-fine-tuning setting; it also supports lightweight fine-tuning (using only ~100B tokens) for further gains. Evaluated on an 8B-parameter model, RPT achieves approximately 10% relative improvement on both programming and general reasoning benchmarks. It effectively mitigates error propagation while preserving decoding efficiency, striking a favorable balance between correction capability and computational overhead.

Technology Category

Application Category

📝 Abstract

Autoregressive language models accumulate errors due to their fixed, irrevocable left-to-right token generation. To address this, we propose a new sampling method called Resample-Previous-Tokens (RPT). RPT mitigates error accumulation by iteratively revisiting and potentially replacing tokens in a window of previously generated text. This method can be integrated into existing autoregressive models, preserving their next-token-prediction quality and speed. Fine-tuning a pretrained 8B parameter model with RPT for only 100B resulted in ~10% relative improvements on reasoning and coding benchmarks compared to the standard sampling.

Problem

Research questions and friction points this paper is trying to address.

Autoregressive models accumulate errors in left-to-right generation

Proposes RPT to iteratively correct previous token errors

Improves reasoning and coding benchmarks by 10%

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative token resampling for error correction

Window-based previous token replacement

Minimal fine-tuning for performance boost

🔎 Similar Papers

No similar papers found.