🤖 AI Summary
To address poor adaptability and limited scalability in open-ecosystem multi-source heterogeneous fine-tuned model merging—caused by missing task/architecture metadata and rapidly growing model sizes—this paper pioneers a constrained optimization formulation for model merging and proposes a Frank-Wolfe-style iterative framework. It integrates model-pool-driven greedy selection with local weighted fusion, enabling large-scale merging without data access and under constant memory overhead. The method is orthogonal to and compatible with existing techniques. Evaluated on 20 computer vision tasks, it improves performance by 15.3% when merging 16 related models, and maintains robustness even when 16 unrelated models are included. It outperforms state-of-the-art data-free methods by 32.8% and surpasses Adamerging by 8.39%.
📝 Abstract
Model merging has emerged as a promising approach for multi-task learning (MTL), offering a data-efficient alternative to conventional fine-tuning. However, with the rapid development of the open-source AI ecosystem and the increasing availability of fine-tuned foundation models, existing model merging methods face two key limitations: (i) They are primarily designed for in-house fine-tuned models, making them less adaptable to diverse model sources with partially unknown model and task information, (ii) They struggle to scale effectively when merging numerous model checkpoints. To address these challenges, we formulate model merging as a constrained optimization problem and introduce a novel approach: Frank-Wolfe Merging (FW-Merging). Inspired by Frank-Wolfe optimization, our approach iteratively selects the most relevant model in the pool to minimize a linear approximation of the objective function and then executes a local merging similar to the Frank-Wolfe update. The objective function is designed to capture the desired behavior of the target-merged model, while the fine-tuned candidate models define the constraint set. More importantly, FW-Merging serves as an orthogonal technique for existing merging methods, seamlessly integrating with them to further enhance accuracy performance. Our experiments show that FW-Merging scales across diverse model sources, remaining stable with 16 irrelevant models and improving by 15.3% with 16 relevant models on 20 CV tasks, while maintaining constant memory overhead, unlike the linear overhead of data-informed merging methods. Compared with the state-of-the-art approaches, FW-Merging surpasses the data-free merging method by 32.8% and outperforms the data-informed Adamerging by 8.39% when merging 20 ViT models.