Excess Description Length of Learning Generalizable Predictors

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the fundamental question of whether fine-tuning large language models elicits pre-existing capabilities or imparts genuinely new ones, and quantifies the source of their generalization ability. To this end, the authors develop an information-theoretic framework grounded in preorder encoding and introduce the Excess Description Length (EDL) metric to measure the amount of predictive structure extracted from data and encoded into model parameters. Theoretical analysis establishes that EDL is non-negative and converges to the redundant description length, thereby providing a rigorous upper bound on generalization gain. Empirical validation on synthetic tasks demonstrates EDL’s explanatory power regarding phenomena such as label noise sensitivity, rare input handling, and format learning. This work further reveals an essential distinction between capability elicitation and instructional learning in scaling behavior, clarifying several misconceptions about information acquisition during fine-tuning.

Technology Category

Application Category

📝 Abstract
Understanding whether fine-tuning elicits latent capabilities or teaches new ones is a fundamental question for language model evaluation and safety. We develop a formal information-theoretic framework for quantifying how much predictive structure fine-tuning extracts from the train dataset and writes into a model's parameters. Our central quantity, Excess Description Length (EDL), is defined via prequential coding and measures the gap between the bits required to encode training labels sequentially using an evolving model (trained online) and the residual encoding cost under the final trained model. We establish that EDL is non-negative in expectation, converges to surplus description length in the infinite-data limit, and provides bounds on expected generalization gain. Through a series of toy models, we clarify common confusions about information in learning: why random labels yield EDL near zero, how a single example can eliminate many bits of uncertainty about the underlying rule(s) that describe the data distribution, why structure learned on rare inputs contributes proportionally little to expected generalization, and how format learning creates early transients distinct from capability acquisition. This framework provides rigorous foundations for the empirical observation that capability elicitation and teaching exhibit qualitatively distinct scaling signatures.
Problem

Research questions and friction points this paper is trying to address.

fine-tuning
latent capabilities
generalization
language model evaluation
model safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

Excess Description Length
prequential coding
generalization
fine-tuning
information theory
🔎 Similar Papers
No similar papers found.
E
Elizabeth Donoway
Department of Physics, University of California, Berkeley, CA USA
Hailey Joren
Hailey Joren
UC San Diego
human-centric machine learning
F
Fabien Roger
Anthropic, San Francisco, CA USA
Jan Leike
Jan Leike
Anthropic
reinforcement learningdeep learningagent alignment