Activation-Guided Consensus Merging for Large Language Models

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing model merging methods overlook functional heterogeneity across neural components and rely on uniform layer-wise weighting assumptions. To address this, we propose Activation-aware Consensus Merging (ACM), a training-free, adaptive layer-wise merging framework. ACM introduces an activation-guided consensus mechanism that dynamically computes fusion coefficients based on mutual information between layer-wise neural activations of pre-trained and fine-tuned models. It integrates TIES-Merging while optimizing consensus weights—requiring no gradient updates or additional training. Experiments on Qwen-7B demonstrate a 55.3% reduction in response latency and a 1.3-point improvement in inference accuracy, outperforming all baselines. Our key contributions are: (i) a functional heterogeneity-aware layer fusion paradigm; (ii) interpretable fusion coefficient learning grounded in activation mutual information; and (iii) an efficient, training-free unified merging framework that jointly optimizes accuracy and inference latency.

Technology Category

Application Category

📝 Abstract

Recent research has increasingly focused on reconciling the reasoning capabilities of System 2 with the efficiency of System 1. While existing training-based and prompt-based approaches face significant challenges in terms of efficiency and stability, model merging emerges as a promising strategy to integrate the diverse capabilities of different Large Language Models (LLMs) into a unified model. However, conventional model merging methods often assume uniform importance across layers, overlooking the functional heterogeneity inherent in neural components. To address this limitation, we propose extbf{A}ctivation-Guided extbf{C}onsensus extbf{M}erging ( extbf{ACM}), a plug-and-play merging framework that determines layer-specific merging coefficients based on mutual information between activations of pre-trained and fine-tuned models. ACM effectively preserves task-specific capabilities without requiring gradient computations or additional training. Extensive experiments on Long-to-Short (L2S) and general merging tasks demonstrate that ACM consistently outperforms all baseline methods. For instance, in the case of Qwen-7B models, TIES-Merging equipped with ACM achieves a extbf{55.3%} reduction in response length while simultaneously improving reasoning accuracy by extbf{1.3} points. We submit the code with the paper for reproducibility, and it will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

Reconciling System 2 reasoning with System 1 efficiency

Overcoming inefficiency in merging diverse LLM capabilities

Addressing functional heterogeneity in neural component merging

Innovation

Methods, ideas, or system contributions that make the work stand out.

Activation-guided layer-specific merging coefficients

Plug-and-play framework without gradient computations

Improves accuracy and reduces response length

🔎 Similar Papers

No similar papers found.