Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address critical challenges in model merging—including misalignment of expert parameters and inconsistent downstream behavior, as well as neglect of inter-layer heterogeneity—this paper proposes Expert Merging, a lightweight, unsupervised model fusion method. Its core contributions are: (1) an unsupervised expert alignment objective based on hidden states and logits; (2) an adaptive calibration mechanism that learns layer-wise coefficients using unlabeled data; and (3) Expert Merging++, which introduces an importance-guided layer partitioning strategy to explicitly model inter-layer heterogeneity and enable efficient parameter allocation. Stability is ensured via coefficient regularization, task-weighted loss, and normalized importance scoring. Evaluated on large language and multimodal models—including Mistral, InternVL, and Qwen2-VL—Expert Merging significantly outperforms both training-free and fine-tuning-based baselines, with some results even surpassing supervised mixture-of-experts training.

Technology Category

Application Category

📝 Abstract
Model merging, which combines multiple domain-specialized experts into a single model, offers a practical path to endow Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) with broad capabilities without the cost of joint training or serving many models. However, training-free methods rely on hand-tuned coefficients, whereas training-based methods primarily align parameters rather than downstream task behavior and typically treat all layers uniformly, ignoring inter-layer heterogeneity. We introduce Expert Merging, a training-light method that learns a small set of layer-wise coefficients using only unlabeled calibration data. The coefficients are optimized to explicitly align the merged model's hidden states and logits with those of the corresponding experts, with a coefficient regularizer for stability and task-weighted losses for controllable trade-offs. To capture inter-layer variation, Expert Merging++ augments this design with importance-guided chunking: a normalized layer-importance metric, derived from learned coefficients, task-vector magnitudes, and parameter counts, allocates more chunk-wise coefficients to high-importance layers while keeping low-importance layers lightweight. The result is a label-free, parameter-efficient, and scalable approach to multi-expert model merging across LLMs and MLLMs. Across MLLM backbones (InternVL and Qwen2-VL) and the LLM backbone (Mistral), our method surpasses strong training-free and training-based merging baselines, with Expert Merging++ delivering further gains and, in some cases, even exceeding supervised Mixture Training. The source code is available at https://github.com/Littleor/ExpertMerging.
Problem

Research questions and friction points this paper is trying to address.

Optimizes model merging for LLMs and MLLMs without labeled data
Aligns hidden states and logits using layer-wise coefficients
Addresses inter-layer heterogeneity through importance-guided chunking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised expert alignment with layer-wise coefficients
Importance-guided chunking for inter-layer variation
Label-free parameter-efficient multi-expert model merging
🔎 Similar Papers
No similar papers found.
D
Dengming Zhang
Zhejiang University
Xiaowen Ma
Xiaowen Ma
Zhejiang University, Huawei Noah's Ark Lab
Computer VisionRemote SensingMulti-modalTime Series
Z
Zhenliang Ni
Huawei Noah’s Ark Lab
Z
Zhenkai Wu
Zhejiang University
Han Shu
Han Shu
Huawei Noah's Ark Lab
X
Xin Jiang
Huawei Noah’s Ark Lab
X
Xinghao Chen
Huawei Noah’s Ark Lab