On the Emergence of Weak-to-Strong Generalization: A Bias-Variance Perspective

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work investigates the Weak-to-Strong Generalization (W2SG) phenomenon—where a stronger student model, trained on labels from a weaker teacher, surpasses the teacher’s performance. Grounded in a bias–variance decomposition via generalized Bregman divergences, we identify the fundamental mechanism: W2SG arises primarily from the expected bias due to prediction mismatch between student and teacher. We establish the first theoretical framework for W2SG under non-convex assumptions, proving that students converging toward the teacher’s posterior mean—not its pointwise prediction—are more likely to trigger W2SG. Accordingly, we propose inverse cross-entropy loss, which explicitly models teacher prediction uncertainty to mitigate supervision overfitting and reduce student predictive entropy. Crucially, our theory relaxes the strong convexity requirement on the student’s hypothesis class. Empirical results across diverse benchmarks demonstrate substantial improvements in student generalization, validating both our theoretical insights and algorithmic design.

Technology Category

Application Category

📝 Abstract

Weak-to-strong generalization (W2SG) refers to the phenomenon where a strong student model, trained on a dataset labeled by a weak teacher, ultimately outperforms the teacher on the target task. Recent studies attribute this performance gain to the prediction misfit between the student and teacher models. In this work, we theoretically investigate the emergence of W2SG through a generalized bias-variance decomposition of Bregman divergence. Specifically, we show that the expected population risk gap between the student and teacher is quantified by the expected misfit between the two models. While this aligns with previous results, our analysis removes several restrictive assumptions, most notably, the convexity of the student's hypothesis class, required in earlier works. Moreover, we show that W2SG is more likely to emerge when the student model approximates its posterior mean teacher, rather than mimicking an individual teacher. Using a concrete example, we demonstrate that if the student model has significantly larger capacity than the teacher, it can indeed converge to this posterior mean. Our analysis also suggests that avoiding overfitting to the teacher's supervision and reducing the entropy of student's prediction further facilitate W2SG. In addition, we show that the reverse cross-entropy loss, unlike the standard forward cross-entropy, is less sensitive to the predictive uncertainty of the teacher. Finally, we empirically verify our theoretical insights and demonstrate that incorporating the reverse cross-entropy loss consistently improves student performance.

Problem

Research questions and friction points this paper is trying to address.

Explores weak-to-strong generalization via bias-variance decomposition

Analyzes student-teacher misfit impact on performance gap

Proposes reverse cross-entropy loss for better student learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bregman divergence decomposes student-teacher risk gap

Student approximates posterior mean, not individual teacher

Reverse cross-entropy loss reduces teacher uncertainty sensitivity

🔎 Similar Papers

No similar papers found.