🤖 AI Summary
Existing large language model (LLM) merging methods rely on access to base model weights, fine-tuning, or reinforcement learning—constraints that limit practicality and scalability. Method: We propose LoRE-Merging, a training-free, gradient-free model merging framework that operates solely on task-specific weight deltas (i.e., task vectors) derived from multiple fine-tuned models. Leveraging the intrinsic low-rank structure of these deltas, LoRE-Merging estimates dominant task directions via singular value decomposition (SVD) without requiring the base model or any additional optimization. Contribution/Results: LoRE-Merging establishes the first unified merging paradigm that neither accesses the base model nor performs further parameter updates. It significantly mitigates merge interference and negative transfer, outperforms state-of-the-art merging methods on multi-task benchmarks, and better preserves both task-specific capabilities and cross-task generalization.
📝 Abstract
While most current approaches rely on further training techniques, such as fine-tuning or reinforcement learning, to enhance model capacities, model merging stands out for its ability of improving models without requiring any additional training. In this paper, we propose a unified framework for model merging based on low-rank estimation of task vectors without the need for access to the base model, named extsc{LoRE-Merging}. Our approach is motivated by the observation that task vectors from fine-tuned models frequently exhibit a limited number of dominant singular values, making low-rank estimations less prone to interference. We implement the method by formulating the merging problem as an optimization problem. Extensive empirical experiments demonstrate the effectiveness of our framework in mitigating interference and preserving task-specific information, thereby advancing the state-of-the-art performance in model merging techniques.