Beyond Parameter Arithmetic: Sparse Complementary Fusion for Distribution-Aware Model Merging

πŸ“… 2026-02-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing model merging approaches rely on heuristic operations in parameter space, which often induce functional interference, leading to degraded generalization and unstable generation. This work proposes SCF-RKL, a sparse complementary fusion framework that abandons the assumption of linear parameter interpolation and instead introduces a distribution alignment mechanism based on reverse Kullback-Leibler divergence to identify and merge complementary parameters. By enabling distribution-aware sparse updates, SCF-RKL effectively mitigates functional interference while preserving stable representations. The method achieves state-of-the-art performance across 24 benchmarks spanning reasoning, instruction following, knowledge retention, and safety, demonstrating superior generalization and generation stability compared to existing techniques.

Technology Category

Application Category

πŸ“ Abstract
Model merging has emerged as a promising paradigm for composing the capabilities of large language models by directly operating in weight space, enabling the integration of specialized models without costly retraining. However, existing merging methods largely rely on parameter-space heuristics, which often introduce severe interference, leading to degraded generalization and unstable generation behaviors such as repetition and incoherent outputs. In this work, we propose Sparse Complementary Fusion with reverse KL (SCF-RKL), a novel model merging framework that explicitly controls functional interference through sparse, distribution-aware updates. Instead of assuming linear additivity in parameter space, SCF-RKL measures the functional divergence between models using reverse Kullback-Leibler divergence and selectively incorporates complementary parameters. This mode-seeking, sparsity-inducing design effectively preserves stable representations while integrating new capabilities. We evaluate SCF-RKL across a wide range of model scales and architectures, covering both reasoning-focused and instruction-tuned models. Extensive experiments on 24 benchmarks spanning advanced reasoning, general reasoning and knowledge, instruction following, and safety demonstrate, vision classification that SCF-RKL consistently outperforms existing model merging methods while maintaining strong generalization and generation stability.
Problem

Research questions and friction points this paper is trying to address.

model merging
functional interference
parameter-space heuristics
generation stability
generalization degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

model merging
sparse fusion
distribution-aware
reverse KL divergence
functional interference
πŸ”Ž Similar Papers
No similar papers found.