🤖 AI Summary
Existing unsupervised self-improvement mechanisms for language models—such as debate, bootstrapping, and internal consistency maximization—lack a unified theoretical foundation. This work formalizes such self-improvement as a consistency optimization problem: finding the most compressible and jointly predictable mapping from context to behavior. We prove that this formulation is equivalent to description-length regularization and achieves theoretical optimality in semi-supervised learning. Grounded in information theory and leveraging pretraining-guided regularization, our framework not only provides a principled theoretical basis for existing methods but also predicts the conditions under which they succeed or fail. Preliminary experiments validate the effectiveness of the proposed approach.
📝 Abstract
Can language models improve their accuracy without external supervision? Methods such as debate, bootstrap, and internal coherence maximization achieve this surprising feat, even matching golden finetuning performance. Yet why they work remains theoretically unclear. We show that they are all special cases of coherence optimization: finding a context-to-behavior mapping that's most compressible and jointly predictable. We prove that coherence optimization is equivalent to description-length regularization, and that among all such regularization schemes, it is optimal for semi-supervised learning when the regularizer is derived from a pretrained model. Our theory, supported by preliminary experiments, explains why feedback-free self-improvement works and predicts when it should succeed or fail.