Weak-to-Strong Generalization under Distribution Shifts

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Under distribution shift, leveraging weak models to supervise strong models often leads to performance reversal failure. To address this challenge, we propose RAVEN—a novel framework that systematically resolves the issue for the first time. RAVEN jointly optimizes, in an end-to-end manner, both the weighted ensemble of weak models and the parameters of the strong model. It introduces a learnable weighting mechanism to dynamically identify high-confidence weak supervision signals, while integrating out-of-distribution (OOD) detection with adaptive aggregation strategies to enable robust weak-to-strong knowledge transfer. Evaluated on multiple OOD benchmarks, RAVEN achieves over 30% improvement over state-of-the-art baselines, while maintaining or even surpassing in-distribution performance. These results demonstrate RAVEN’s effectiveness, generalizability, and practical utility for robust supervised learning under distributional shift.

Technology Category

Application Category

📝 Abstract

As future superhuman models become increasingly complex, accurately supervising their behavior may exceed human capabilities. Recent works have demonstrated that in such scenarios, weak models can effectively supervise strong models, a phenomenon known as weak-to-strong generalization. However, we find that naive weak-to-strong generalization fails under distribution shifts, often leading to worse performance of the strong model than its weak supervisors. To address this, we propose RAVEN, a robust weak-to-strong generalization framework that dynamically learns the optimal combinations of weak models in addition to parameters of the strong model. We demonstrate the effectiveness of RAVEN on image classification, text classification, and preference alignment tasks. RAVEN outperforms alternative baselines by over 30% on out-of-distribution tasks while matching or surpassing existing methods on in-distribution tasks. Moreover, our results show that RAVEN assigns higher weights to more accurate weak models, demonstrating its ability to automatically identify trustworthy supervision.

Problem

Research questions and friction points this paper is trying to address.

Weak-to-strong generalization fails under distribution shifts

Proposes RAVEN framework for robust weak-to-strong generalization

Dynamically learns optimal combinations of weak supervision models

Innovation

Methods, ideas, or system contributions that make the work stand out.

RAVEN dynamically learns optimal weak model combinations

RAVEN trains strong model parameters under distribution shifts

RAVEN automatically identifies trustworthy supervision sources

🔎 Similar Papers

Revisiting Knowledge Distillation under Distribution Shift