Reinforced Model Merging

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing training-free model fusion methods apply uniform weighting across all parameters, degrading performance, while gradient-free search-based optimization strategies suffer from low efficiency. Method: We propose the first gradient-free, reinforcement learning–driven (PPO) layer-wise fusion framework. It introduces an environment-agent co-design mechanism for adaptive, layer-granular weight allocation; constructs a layer-wise action space; and incorporates a subset-driven sparse reward estimation scheme that accelerates reward evaluation by 100×. Contribution/Results: Our method achieves state-of-the-art performance across diverse vision and NLP benchmarks, significantly outperforming conventional weighted fusion and search-based baselines. It incurs negligible computational overhead, maintains model lightweightness, and supports efficient edge deployment—without requiring any gradient computation or fine-tuning.

Technology Category

Application Category

📝 Abstract
The success of large language models has garnered widespread attention for model merging techniques, especially training-free methods which combine model capabilities within the parameter space. However, two challenges remain: (1) uniform treatment of all parameters leads to performance degradation; (2) search-based algorithms are often inefficient. In this paper, we present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. These components interact to execute layer-wise merging actions, aiming to search the optimal merging architecture. Notably, RMM operates without any gradient computations on the original models, rendering it feasible for edge devices. Furthermore, by utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times. Extensive experiments demonstrate that RMM achieves state-of-the-art performance across various vision and NLP datasets and effectively overcomes the limitations of the existing baseline methods. Our code is available at https://github.com/WuDiHJQ/Reinforced-Model-Merging.
Problem

Research questions and friction points this paper is trying to address.

Optimizes model merging to avoid performance degradation
Improves efficiency of search-based merging algorithms
Enables merging without gradient computations for edge devices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforced Model Merging framework for optimal architecture
Layer-wise merging without gradient computations
Data subsets accelerate reward feedback 100x
🔎 Similar Papers
No similar papers found.
J
Jiaqi Han
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Jingwen Ye
Jingwen Ye
Assistant Professor, Monash University
Computer Vision
Shunyu Liu
Shunyu Liu
Nanyang Technological University
Multi-Agent LearningReinforcement LearningLarge Language ModelsPower System Control
H
Haofei Zhang
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
J
Jie Song
School of Software Technology, Zhejiang University, Ningbo, China
Z
Zunlei Feng
School of Software Technology, Zhejiang University, Ningbo, China
M
Mingli Song
College of Computer Science and Technology, Zhejiang University, Hangzhou, China