A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of theoretical understanding regarding iterative self-improvement of large language models under limited data regimes, particularly the overlooked role of task difficulty and curriculum design. The authors model each self-improvement iteration as maximum likelihood fine-tuning on a reward-filtered data distribution and, for the first time, establish finite-sample theoretical guarantees from a task-centric perspective. Their analysis quantifies the interplay among model capacity, task difficulty, and data budget, revealing that a curriculum progressing from easy to hard tasks enables sustained performance gains and mitigates premature saturation, outperforming static task mixtures. Empirical validation through Monte Carlo simulations and graph-structured reasoning tasks confirms that this curriculum strategy significantly enhances model performance.

Technology Category

Application Category

📝 Abstract
Iterative self-improvement fine-tunes an autoregressive large language model (LLM) on reward-verified outputs generated by the LLM itself. In contrast to the empirical success of self-improvement, the theoretical foundation of this generative, iterative procedure in a practical, finite-sample setting remains limited. We make progress toward this goal by modeling each round of self-improvement as maximum-likelihood fine-tuning on a reward-filtered distribution and deriving finite-sample guarantees for the expected reward. Our analysis reveals an explicit feedback loop where better models accept more data per iteration, supporting sustained self-improvement while explaining eventual saturation of such improvement. Adopting a task-centric view by considering reasoning tasks with multiple difficulty levels, we further prove quantifiable conditions on model initialization, task difficulty, and sample budget where easy-to-hard curricula provably achieve better guarantees than training on fixed mixtures of tasks. Our analyses are validated via Monte-Carlo simulations and controlled experiments on graph-based reasoning tasks.
Problem

Research questions and friction points this paper is trying to address.

iterative self-improvement
finite-sample guarantee
easy-to-hard curriculum
task difficulty
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

iterative self-improvement
easy-to-hard curriculum
finite-sample guarantee
task-centric theory
reward-filtered fine-tuning
🔎 Similar Papers
No similar papers found.