Coherence Mechanisms for Provable Self-Improvement

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Current large language models (LLMs) lack theoretical guarantees for self-improvement, relying predominantly on empirical heuristics. This paper introduces the first provably sound unsupervised self-improvement framework, grounded in **consistency**—a necessary principle requiring model outputs to remain invariant under task-preserving input transformations. Methodologically, we design a direct two-step projection scheme that updates the model via Bregman divergence minimization, ensuring monotonic improvement while preserving original performance. Our theoretical contributions are threefold: (1) a general characterization theorem proving that any mechanism with strict improvement guarantees must inherently satisfy consistency; (2) robust extensions addressing non-realizable settings, finite-sample regimes, and relaxed constraints; and (3) a rigorous proof that the expected Bregman divergence decreases monotonically, yielding tight bounds on improvement rigidity. This work establishes consistency as the foundational theoretical principle for self-improvement in LLMs.

Technology Category

Application Category

📝 Abstract

Self-improvement is a critical capability for large language models and other intelligent systems, enabling them to refine their behavior and internal consistency without external supervision. Despite its importance, prior approaches largely rely on empirical heuristics and lack formal guarantees. In this paper, we propose a principled framework for self-improvement based on the concept of emph{coherence}, which requires that a model's outputs remain consistent under task-preserving transformations of the input. We formalize this concept using projection-based mechanisms that update a baseline model to be coherent while remaining as close as possible to its original behavior. We provide rigorous theoretical guarantees that these mechanisms achieve emph{monotonic improvement}, measured by a reduction in expected Bregman divergence. Our analysis is comprehensive, covering both emph{direct} and emph{two-step} projection methods, and robustly extends these guarantees to non-realizable settings, empirical (finite-sample) distributions, and relaxed coherence constraints. Furthermore, we establish a general emph{characterization theorem}, showing that any mechanism with similar provable improvement guarantees must inherently conform to a coherence-based structure. This culminates in rigidity results under the demand for universal improvement, establishing coherence as a fundamental and, in a formal sense, necessary principle for provable self-improvement.

Problem

Research questions and friction points this paper is trying to address.

Developing formal guarantees for self-improvement in language models

Establishing coherence as necessary principle for monotonic improvement

Extending theoretical guarantees to non-ideal real-world settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Coherence mechanisms ensure model output consistency

Projection-based methods guarantee monotonic improvement theoretically

Characterization theorem establishes coherence as necessary principle

🔎 Similar Papers

Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement