Model soups need only one ingredient

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the trade-off between in-distribution accuracy and out-of-distribution robustness commonly observed in fine-tuned large models, as well as the high computational cost of existing model fusion techniques. To this end, the authors propose MonoSoup, a post-hoc, data-free, and hyperparameter-free method that achieves performance comparable to multi-model ensembles using only a single fine-tuned checkpoint. MonoSoup is the first to apply singular value decomposition (SVD) to decompose weight updates layer-wise and adaptively reweights each layer’s update direction based on an entropy-derived effective rank. This approach enhances robustness while preserving task-specific adaptation. Experiments on CLIP and Qwen models demonstrate that MonoSoup significantly outperforms baseline methods across tasks involving natural distribution shifts and mathematical reasoning, achieving a favorable balance between accuracy and generalization without any additional training overhead.

Technology Category

Application Category

📝 Abstract
Fine-tuning large pre-trained models on a target distribution often improves in-distribution (ID) accuracy, but at the cost of out-of-distribution (OOD) robustness as representations specialize to the fine-tuning data. Weight-space ensembling methods, such as Model Soups, mitigate this effect by averaging multiple checkpoints, but they are computationally prohibitive, requiring the training and storage of dozens of fine-tuned models. In this paper, we introduce MonoSoup, a simple, data-free, hyperparameter-free, post-hoc method that achieves a strong ID-OOD balance using only a single checkpoint. Our method applies Singular Value Decomposition (SVD) to each layer's update and decomposes it into high-energy directions that capture task-specific adaptation and low-energy directions that introduce noise but may still encode residual signals useful for robustness. MonoSoup then uses entropy-based effective rank to automatically re-weigh these components with layer-wise coefficients that account for the spectral and geometric structure of the model. Experiments on CLIP models fine-tuned on ImageNet and evaluated under natural distribution shifts, as well as on Qwen language models tested on mathematical reasoning and multiple-choice benchmarks, show that this plug-and-play approach is a practical and effective alternative to multi-checkpoint methods, retaining much of their benefits without their computational overhead.
Problem

Research questions and friction points this paper is trying to address.

out-of-distribution robustness
fine-tuning
model ensembling
computational efficiency
distribution shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Soups
Singular Value Decomposition
effective rank
out-of-distribution robustness
post-hoc adaptation
🔎 Similar Papers
No similar papers found.