On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Weak-to-strong generalization—where a student model trained on noisy labels from a weaker teacher outperforms the teacher—lacks a rigorous theoretical explanation. This work provides the first systematic theoretical characterization of this phenomenon, identifying three key mechanisms: (1) student compensation for teacher under-regularization; (2) alignment advantage via weighted regularization structures; and (3) collaborative learning of easy/hard features in nonlinear settings. Using ridge regression, weighted ridge regression, and nonlinear multi-index models—within both parametric and pre-trained representation paradigms—we derive tight bounds and rigorously prove that, under specific regularization strengths and parameter configurations, the student’s test error can strictly fall below the teacher’s. Our analysis establishes the first unified theoretical foundation for knowledge distillation and self-training, bridging classical statistical learning theory with modern representation learning. The results highlight the critical role of regularization design and feature learning dynamics in weak-supervision transfer.

Technology Category

Application Category

📝 Abstract
Weak-to-strong generalization, where a student model trained on imperfect labels generated by a weaker teacher nonetheless surpasses that teacher, has been widely observed but the mechanisms that enable it have remained poorly understood. In this paper, through a theoretical analysis of simple models, we uncover three core mechanisms that can drive this phenomenon. First, by analyzing ridge regression, we study the interplay between the teacher and student regularization and prove that a student can compensate for a teacher's under-regularization and achieve lower test error. We also analyze the role of the parameterization regime of the models. Second, by analyzing weighted ridge regression, we show that a student model with a regularization structure more aligned to the target, can outperform its teacher. Third, in a nonlinear multi-index setting, we demonstrate that a student can learn easy, task-specific features from the teacher while leveraging its own broader pre-training to learn hard-to-learn features that the teacher cannot capture.
Problem

Research questions and friction points this paper is trying to address.

Understanding mechanisms behind weak-to-strong generalization in models
Analyzing regularization interplay between teacher and student models
Exploring parameterization regimes and feature learning in weak-to-strong generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Student compensates teacher's under-regularization via ridge regression
Weighted ridge regression aligns regularization to target
Student learns easy features from teacher, hard ones independently