FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization

📅 2025-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor adaptability and limited scalability in open-ecosystem multi-source heterogeneous fine-tuned model merging—caused by missing task/architecture metadata and rapidly growing model sizes—this paper pioneers a constrained optimization formulation for model merging and proposes a Frank-Wolfe-style iterative framework. It integrates model-pool-driven greedy selection with local weighted fusion, enabling large-scale merging without data access and under constant memory overhead. The method is orthogonal to and compatible with existing techniques. Evaluated on 20 computer vision tasks, it improves performance by 15.3% when merging 16 related models, and maintains robustness even when 16 unrelated models are included. It outperforms state-of-the-art data-free methods by 32.8% and surpasses Adamerging by 8.39%.

Technology Category

Application Category

📝 Abstract
Model merging has emerged as a promising approach for multi-task learning (MTL), offering a data-efficient alternative to conventional fine-tuning. However, with the rapid development of the open-source AI ecosystem and the increasing availability of fine-tuned foundation models, existing model merging methods face two key limitations: (i) They are primarily designed for in-house fine-tuned models, making them less adaptable to diverse model sources with partially unknown model and task information, (ii) They struggle to scale effectively when merging numerous model checkpoints. To address these challenges, we formulate model merging as a constrained optimization problem and introduce a novel approach: Frank-Wolfe Merging (FW-Merging). Inspired by Frank-Wolfe optimization, our approach iteratively selects the most relevant model in the pool to minimize a linear approximation of the objective function and then executes a local merging similar to the Frank-Wolfe update. The objective function is designed to capture the desired behavior of the target-merged model, while the fine-tuned candidate models define the constraint set. More importantly, FW-Merging serves as an orthogonal technique for existing merging methods, seamlessly integrating with them to further enhance accuracy performance. Our experiments show that FW-Merging scales across diverse model sources, remaining stable with 16 irrelevant models and improving by 15.3% with 16 relevant models on 20 CV tasks, while maintaining constant memory overhead, unlike the linear overhead of data-informed merging methods. Compared with the state-of-the-art approaches, FW-Merging surpasses the data-free merging method by 32.8% and outperforms the data-informed Adamerging by 8.39% when merging 20 ViT models.
Problem

Research questions and friction points this paper is trying to address.

Addresses limitations in merging diverse model sources with unknown information.
Solves scalability issues when merging numerous model checkpoints.
Enhances accuracy and stability in multi-task learning through Frank-Wolfe optimization.
Innovation

Methods, ideas, or system contributions that make the work stand out.

FW-Merging uses Frank-Wolfe optimization for model merging.
It scales effectively with diverse and numerous model checkpoints.
FW-Merging integrates with existing methods to enhance accuracy.
H
Hao Mark Chen
Imperial College London, UK
Shell Xu Hu
Shell Xu Hu
Samsung AI Center - Cambridge
Machine Learning
Wayne Luk
Wayne Luk
Professor of Computer Engineering, Imperial College London
Hardware and ArchitectutreReconfigurable ComputingDesign Automation
T
Timothy Hospedales
Samsung AI Center, Cambridge, UK
H
Hongxiang Fan
Imperial College London, UK