Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the mechanism by which a strong student model trained on pseudo-labels generated by a weak teacher can outperform the teacher. From the perspective of intrinsic low-dimensional structure, we propose a novel explanation: orthogonal components of subspace discrepancy between weak and strong models’ feature representations suppress the variance of generalization error—scaled inversely with the ratio of effective dimensions—yielding a “discrepancy-as-advantage” principle. We derive the first exact variance decomposition formula for weak-to-strong (W2S) generalization error. Integrating intrinsic dimension estimation, unregularized ridge regression theory, and spectral subspace decomposition, we construct an analytically tractable framework for pseudo-label fine-tuning. Empirical validation is conducted on synthetic data and vision benchmarks, confirming the mechanism and uncovering scaling laws governing sample complexity and performance recovery.

Technology Category

Application Category

📝 Abstract
Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong (large) student model is trained on pseudo-labels generated by a weak teacher. Surprisingly, W2S FT often outperforms the weak teacher. We seek to understand this phenomenon through the observation that FT often occurs in intrinsically low-dimensional spaces. Leveraging the low intrinsic dimensionality of FT, we analyze W2S in the ridgeless regression setting from a variance reduction perspective. For a strong student - weak teacher pair with sufficiently expressive low-dimensional feature subspaces $mathcal{V}_s, mathcal{V}_w$, we provide an exact characterization of the variance that dominates the generalization error of W2S. This unveils a virtue of discrepancy between the strong and weak models in W2S: the variance of the weak teacher is inherited by the strong student in $mathcal{V}_s cap mathcal{V}_w$, while reduced by a factor of $dim(mathcal{V}_s)/N$ in the subspace of discrepancy $mathcal{V}_w setminus mathcal{V}_s$ with $N$ pseudo-labels for W2S. Further, our analysis casts light on the sample complexities and the scaling of performance gap recovery in W2S. The analysis is supported with experiments on both synthetic regression problems and real vision tasks.
Problem

Research questions and friction points this paper is trying to address.

Understanding weak-to-strong generalization in finetuning
Analyzing variance reduction in low-dimensional spaces
Exploring discrepancy benefits in model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weak-to-Strong generalization technique
Low intrinsic dimensionality analysis
Variance reduction in pseudo-labels
🔎 Similar Papers
No similar papers found.