🤖 AI Summary
Autoregressive language models suffer from error accumulation due to their unidirectional generation mechanism. To address this, we propose Resample-Previous-Tokens (RPT), the first plug-and-play local resampling method integrated into standard autoregressive decoding—without modifying the model architecture. RPT iteratively backtracks and resamples previously generated tokens within a sliding window, enabling inference-time correction in a zero-fine-tuning setting; it also supports lightweight fine-tuning (using only ~100B tokens) for further gains. Evaluated on an 8B-parameter model, RPT achieves approximately 10% relative improvement on both programming and general reasoning benchmarks. It effectively mitigates error propagation while preserving decoding efficiency, striking a favorable balance between correction capability and computational overhead.
📝 Abstract
Autoregressive language models accumulate errors due to their fixed, irrevocable left-to-right token generation. To address this, we propose a new sampling method called Resample-Previous-Tokens (RPT). RPT mitigates error accumulation by iteratively revisiting and potentially replacing tokens in a window of previously generated text. This method can be integrated into existing autoregressive models, preserving their next-token-prediction quality and speed. Fine-tuning a pretrained 8B parameter model with RPT for only 100B resulted in ~10% relative improvements on reasoning and coding benchmarks compared to the standard sampling.