Revisiting Weight Averaging for Model Merging

📅 2024-12-11
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Model ensembling via parameter averaging enables zero-shot multi-task learning but suffers from cross-task parameter interference, degrading performance. This work reveals that weight averaging implicitly constructs centralized task vectors, whose principal components encode task-specific knowledge. Building on this insight, we propose a “centralization + low-rank approximation” fusion framework: first, task vectors are centralized to eliminate shared biases; then, singular value decomposition (SVD) is applied for low-rank approximation, effectively suppressing interference. The method yields robust and scalable performance gains across visual benchmarks—supporting arbitrary numbers of tasks and model scales—and achieves competitive results on NLP multi-task benchmarks. Our core contributions are twofold: (i) establishing, for the first time, a theoretical connection between weight averaging and task vector centralization; and (ii) introducing a geometry-inspired low-rank fusion paradigm grounded in task vector structure.

Technology Category

Application Category

📝 Abstract
Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training. While a straightforward approach is to average model parameters across tasks, this often results in suboptimal performance due to interference among parameters across tasks. In this paper, we present intriguing results that weight averaging implicitly induces task vectors centered around the weight averaging itself and that applying a low-rank approximation to these centered task vectors significantly improves merging performance. Our analysis shows that centering the task vectors effectively reduces task interference and most of task-specific knowledge is concentrated in the top singular vectors. Our method demonstrates robust and scalable performance on vision benchmarks across varying numbers of tasks and model sizes. Furthermore, we observe that our approach is applicable to natural language processing tasks with competitive performance.
Problem

Research questions and friction points this paper is trying to address.

Improves model merging by reducing task interference
Uses low-rank approximation on centered task vectors
Enhances multi-task learning across vision and NLP
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weight averaging induces centered task vectors
Low-rank approximation improves merging performance
Centering reduces interference, concentrates task knowledge
🔎 Similar Papers
No similar papers found.