Self-Soupervision: Cooking Model Soups without Labels

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work proposes a novel model fusion framework for self-supervised learning that operates without labeled data, addressing the limitations of existing fusion methods that rely on supervision and ground-truth labels—particularly their inapplicability in label-scarce settings and limited robustness under distribution shifts. By training diverse self-supervised components (e.g., MAE, MoCoV3, MMCR) on unlabeled data and fusing them via parameter interpolation, the approach enables flexible integration of models with heterogeneous architectures and hyperparameter configurations. Robustness is further enhanced through evaluation on corrupted test sets. Empirical results demonstrate that the fused model achieves significant improvements, yielding +3.5% and +7% robustness gains over the best individual component on ImageNet-C and LAION-C benchmarks, respectively, thereby establishing a new paradigm for unsupervised model ensembling.

Technology Category

Application Category

📝 Abstract

Model soups are strange and strangely effective combinations of parameters. They take a model (the stock), fine-tune it into multiple models (the ingredients), and then mix their parameters back into one model (the soup) to improve predictions. While all known soups require supervised learning, and optimize the same loss on labeled data, our recipes for Self-\emph{Soup}ervision generalize soups to self-supervised learning (SSL). Our Self-Souping lets us flavor ingredients on new data sources, e.g. from unlabeled data from a task for transfer or from a shift for robustness. We show that Self-Souping on corrupted test data, then fine-tuning back on uncorrupted train data, boosts robustness by +3.5\% (ImageNet-C) and +7\% (LAION-C). Self-\emph{Soup}ervision also unlocks countless SSL algorithms to cook the diverse ingredients needed for more robust soups. We show for the first time that ingredients can differ in their SSL hyperparameters -- and more surprisingly, in their SSL algorithms. We cook soups of MAE, MoCoV3, and MMCR ingredients that are more accurate than any one single SSL ingredient.

Problem

Research questions and friction points this paper is trying to address.

model soups

self-supervised learning

robustness

transfer learning

unlabeled data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Soupervision

model soups

self-supervised learning