🤖 AI Summary
This work investigates whether a weak teacher model (e.g., GPT-2) can effectively guide a stronger student model (e.g., GPT-4) to achieve “weak-to-strong” generalization.
Method: Focusing on random feature networks, we train large-scale student models solely on pseudo-labels generated by the weak teacher and analyze generalization via population risk and generalization error bounds, without assuming architectural priors of large language models.
Contribution/Results: We establish that early stopping is the key mechanism enabling performance gains; the generalization improvement is jointly determined by the student’s feature dimensionality and stopping time, and admits a quantifiable upper bound. Crucially, this is the first work to rigorously prove weak-to-strong generalization within a formal theoretical framework—specifically for linearized models such as two-layer networks and random feature models—and to provide explicit, quantitative characterizations of the gain. Our analysis furnishes novel theoretical foundations for knowledge distillation and model compression.
📝 Abstract
Weak-to-Strong Generalization (Burns et al., 2024) is the phenomenon whereby a strong student, say GPT-4, learns a task from a weak teacher, say GPT-2, and ends up significantly outperforming the teacher. We show that this phenomenon does not require a strong learner like GPT-4. We consider student and teacher that are random feature models, described by two-layer networks with a random and fixed bottom layer and a trained top layer. A"weak"teacher, with a small number of units (i.e. random features), is trained on the population, and a"strong"student, with a much larger number of units (i.e. random features), is trained only on labels generated by the weak teacher. We demonstrate, prove, and understand how the student can outperform the teacher, even though trained only on data labeled by the teacher. We also explain how such weak-to-strong generalization is enabled by early stopping. Importantly, we also show the quantitative limits of weak-to-strong generalization in this model.