Weak-to-Strong Generalization is Nearly Inevitable (in Linear Models)

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study investigates the phenomenon wherein a student model, fine-tuned solely using feedback from a weaker teacher model, consistently outperforms its teacher—even when both models possess identical capacity. Focusing on the standard linear logistic regression framework, the authors demonstrate that this "weak-to-strong generalization" occurs broadly under mild assumptions on the data distribution. Through formal analysis, they establish that such performance inversion is not only possible but nearly inevitable across most teacher–student model pairs, thereby challenging prevailing theoretical notions that attribute student superiority primarily to disparities in model capacity. The findings reveal that reliance on teacher supervision alone, without architectural advantages, can suffice for consistent generalization gains in linear settings.

📝 Abstract

Weak-to-strong generalization is a phenomenon in post-training whereby a strong student model, when finetuned solely with feedback from a weaker teacher, can not only surpass the teacher, but can improve upon its own capabilities. Recent work of Burns et al. (2023) demonstrated that this can occur in the setting of frontier language models, and subsequently there has been a flurry of both empirical work trying to exploit this phenomenon, as well as theoretical work attempting to understand it. In this work, we demonstrate that weak-to-strong generalization occurs in standard linear logistic regression, under mild distributional assumptions on the data. In fact, we show that this happens for most student-teacher pairs, suggesting that weak-to-strong generalization is in fact \emph{almost inevitable}, even in this basic setting. Notably, our setting does not require the student to be more expressive or have more model capacity in any way compared to the teacher, which runs contrary to the prevailing theoretical belief that a mismatch in model capacity is a central mechanism to weak-to-strong generalization.

Problem

Research questions and friction points this paper is trying to address.

weak-to-strong generalization

linear models

student-teacher learning

model capacity

logistic regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

weak-to-strong generalization

linear models

logistic regression