🤖 AI Summary
Current large language models (LLMs) lack theoretical guarantees for self-improvement, relying predominantly on empirical heuristics. This paper introduces the first provably sound unsupervised self-improvement framework, grounded in **consistency**—a necessary principle requiring model outputs to remain invariant under task-preserving input transformations. Methodologically, we design a direct two-step projection scheme that updates the model via Bregman divergence minimization, ensuring monotonic improvement while preserving original performance. Our theoretical contributions are threefold: (1) a general characterization theorem proving that any mechanism with strict improvement guarantees must inherently satisfy consistency; (2) robust extensions addressing non-realizable settings, finite-sample regimes, and relaxed constraints; and (3) a rigorous proof that the expected Bregman divergence decreases monotonically, yielding tight bounds on improvement rigidity. This work establishes consistency as the foundational theoretical principle for self-improvement in LLMs.
📝 Abstract
Self-improvement is a critical capability for large language models and other intelligent systems, enabling them to refine their behavior and internal consistency without external supervision. Despite its importance, prior approaches largely rely on empirical heuristics and lack formal guarantees. In this paper, we propose a principled framework for self-improvement based on the concept of emph{coherence}, which requires that a model's outputs remain consistent under task-preserving transformations of the input. We formalize this concept using projection-based mechanisms that update a baseline model to be coherent while remaining as close as possible to its original behavior. We provide rigorous theoretical guarantees that these mechanisms achieve emph{monotonic improvement}, measured by a reduction in expected Bregman divergence. Our analysis is comprehensive, covering both emph{direct} and emph{two-step} projection methods, and robustly extends these guarantees to non-realizable settings, empirical (finite-sample) distributions, and relaxed coherence constraints. Furthermore, we establish a general emph{characterization theorem}, showing that any mechanism with similar provable improvement guarantees must inherently conform to a coherence-based structure. This culminates in rigidity results under the demand for universal improvement, establishing coherence as a fundamental and, in a formal sense, necessary principle for provable self-improvement.