On the Blessing of Pre-training in Weak-to-Strong Generalization

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work investigates the pivotal role of pretraining in weak-to-strong generalization (W2SG) and its underlying theoretical mechanisms. By modeling pretraining as a spectral initialization process within a high-dimensional single-index framework with spiked Gaussian data, the study reveals how pretraining provides a geometric warm start that steers optimization into a region exhibiting perturbed strong convexity. The authors derive theoretical generalization bounds for W2SG and, for the first time, demonstrate both theoretically and empirically that W2SG is not an inherent property of models but rather an emergent phenomenon arising through a phase transition during pretraining. Combining high-dimensional statistical analysis, spectral methods, synthetic data simulations, and evaluations of intermediate checkpoints from large language models, the research establishes pretraining as a necessary prerequisite for achieving W2SG.

📝 Abstract

The paradigm of Weak-to-Strong Generalization (W2SG) suggests that a pre-trained strong model can surpass its weak supervisor, yet the decisive role of pre-training remains theoretically and empirically under-explored. In this work, we identify pre-training as the essential prerequisite for the emergence of W2SG. Theoretically, we formalize the W2SG problem within a high-dimensional single-index model framework using spiked Gaussian data, modeling pre-training as a spectral initialization step. Building upon prior impossibility results regarding the failure of learning under random initialization, we prove that W2SG is achievable when pre-training provides a geometric warm start that places the model within an "effective region" characterized by a perturbed strong-convexity geometry. Within this region, we derive a rigorous generalization bound that naturally captures the optimization dynamics: an initial performance improvement followed by a saturation bottleneck dictated by the weak supervisor's bias. Empirically, we first validate all our assumptions and theoretical insights through controlled synthetic simulations. Finally, through a massive-scale evaluation of hundreds of intermediate pre-training checkpoints from large language models, we demonstrate that W2SG is not an innate capability but emerges via a phase transition tightly coupled with the progression of pre-training.

Problem

Research questions and friction points this paper is trying to address.

Weak-to-Strong Generalization

pre-training

generalization

spectral initialization

phase transition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weak-to-Strong Generalization

pre-training

spectral initialization