🤖 AI Summary
This work proposes a training-free regeneration paradigm for self-improvement of large language models, addressing the longstanding trade-off between efficiency and accuracy in existing approaches. Current methods either incur high computational costs and risk entrenching erroneous reasoning through iterative refinement or fail to correct inherent flaws via multi-sampling strategies. The proposed approach introduces, for the first time, Contrastive Reflection Memory, which integrates a self-verification mechanism with a from-scratch regeneration strategy to enable the model to escape erroneous reasoning paths within a single inference pass. Evaluated across nine benchmarks spanning algorithmic, symbolic, and domain-specific tasks, the method significantly outperforms state-of-the-art techniques while maintaining low computational overhead.
📝 Abstract
Verification-guided self-improvement has recently emerged as a promising approach to improving the accuracy of large language model (LLM) outputs. However, existing approaches face a trade-off between inference efficiency and accuracy: iterative verification-rectification is computationally expensive and prone to being trapped in faulty reasoning, while best-of-N selection requires extensive sampling without addressing internal model flaws. We propose a training-free regeneration paradigm that leverages an offline-curated contrastive Reflection Memory (RM) to provide corrective guidance, while regenerating from scratch helps break out of faulty reasoning. At inference time, the method performs RM-guided self-verification followed by a single RM-guided regeneration, avoiding both iterative correction and multi-sample selection. We evaluated our method on nine benchmarks that span algorithmic, reasoning, symbolic, and domain-specific tasks in both small- and large-scale LLMs. Experiment results show that our method outperforms prior methods while maintaining low computational cost.