🤖 AI Summary
This paper investigates the “weak-to-strong generalization” phenomenon—where a stronger student model outperforms its weaker teacher under weak supervision—focusing on architectural transition from linear CNNs to two-layer ReLU CNNs. Existing theoretical analyses are limited to linear or random models and fail to capture mechanisms of practical neural networks.
Method: We formally analyze this phenomenon in nonlinear CNNs for the first time, training two-layer ReLU CNNs via gradient descent using pseudo-labels generated by a pretrained linear CNN on structured data; we systematically model the dynamic interplay between signal difficulty and noise distribution.
Results: Our theory reveals two distinct regimes: (i) a phase transition between benign and harmful overfitting under data scarcity; and (ii) early label correction followed by overtraining-induced degradation under data sufficiency. We precisely characterize the boundary conditions and evolutionary pathways for weak-to-strong generalization, establishing the first provably grounded, structured explanatory framework for deep model generalization.
📝 Abstract
Weak-to-strong generalization refers to the phenomenon where a stronger model trained under supervision from a weaker one can outperform its teacher. While prior studies aim to explain this effect, most theoretical insights are limited to abstract frameworks or linear/random feature models. In this paper, we provide a formal analysis of weak-to-strong generalization from a linear CNN (weak) to a two-layer ReLU CNN (strong). We consider structured data composed of label-dependent signals of varying difficulty and label-independent noise, and analyze gradient descent dynamics when the strong model is trained on data labeled by the pretrained weak model. Our analysis identifies two regimes -- data-scarce and data-abundant -- based on the signal-to-noise characteristics of the dataset, and reveals distinct mechanisms of weak-to-strong generalization. In the data-scarce regime, generalization occurs via benign overfitting or fails via harmful overfitting, depending on the amount of data, and we characterize the transition boundary. In the data-abundant regime, generalization emerges in the early phase through label correction, but we observe that overtraining can subsequently degrade performance.