🤖 AI Summary
Large language models exhibit limited generalization on out-of-distribution complex reasoning tasks—e.g., 100-digit addition, long-sequence deduction, and maze solving. This paper proposes a plug-and-play self-improvement framework for pretrained Transformers that enables weak-to-strong self-generated curriculum learning without architectural modifications. Our method integrates self-distillation, self-generated sample filtering, iterative supervised fine-tuning, and difficulty-incremental training to autonomously construct and learn high-quality solution trajectories. The core contribution is the introduction of the “self-generated curriculum” paradigm—a novel, systematic approach to enhancing logical extrapolation and length generalization. Experiments demonstrate order-of-magnitude generalization gains across arithmetic, string manipulation, and maze-solving tasks (e.g., from 10- to 100-digit addition), with error rates decreasing exponentially. Moreover, leveraging pretrained initialization significantly accelerates convergence.
📝 Abstract
Large language models often struggle with length generalization and solving complex problem instances beyond their training distribution. We present a self-improvement approach where models iteratively generate and learn from their own solutions, progressively tackling harder problems while maintaining a standard transformer architecture. Across diverse tasks including arithmetic, string manipulation, and maze solving, self-improving enables models to solve problems far beyond their initial training distribution-for instance, generalizing from 10-digit to 100-digit addition without apparent saturation. We observe that in some cases filtering for correct self-generated examples leads to exponential improvements in out-of-distribution performance across training rounds. Additionally, starting from pretrained models significantly accelerates this self-improvement process for several tasks. Our results demonstrate how controlled weak-to-strong curricula can systematically teach a model logical extrapolation without any changes to the positional embeddings, or the model architecture.