Vanishing Feature: Diagnosing Model Merging and Beyond

📅 2024-02-05

📈 Citations: 3

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work identifies a pervasive “feature vanishing” phenomenon in model merging and pruning: input features progressively attenuate during deep-layer propagation, causing variance collapse, failure of permutation-invariant merging, and sharp accuracy degradation under high pruning ratios. To address this, we propose Preserve-First Merging (PFM), the first merging strategy that prioritizes retention of shallow-layer features to achieve high-accuracy, fine-tuning-free model fusion. We further show that feature vanishing similarly impairs pruned models, motivating a lightweight post-pruning normalization technique. Through theoretical modeling and extensive empirical validation, we demonstrate that PFM consistently outperforms source models across diverse multi-task merging scenarios; meanwhile, our post-pruning normalization significantly improves one-shot pruning accuracy at 90% sparsity. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Model merging offers an efficient way to combine pre-trained neural networks but often suffers from inconsistent performance, especially when merging models with different initializations. We identify the ``vanishing feature'' phenomenon, where input-induced features diminish during propagation through the merged model, degrading performance. Through theoretical and empirical analysis, we reveal that this phenomenon underpins challenges like variance collapse and explains techniques like permutation-based merging, post-merging normalization, etc. We show that existing normalization strategies can be enhanced by precisely targeting the vanishing feature issue. Leveraging these insights, we propose the ``Preserve-First Merging'' (PFM) strategy, which focuses on preserving early-layer features, enabling the merged models, for the first time, to outperform the original models in advanced settings without post-training. Furthermore, we demonstrate that the vanishing feature phenomenon extends to other contexts, such as model pruning. Applying post-pruning normalization to mitigate the issue significantly improves one-shot pruning performance at high sparsity, offering a simple and effective post-pruning solution. The code is available at https://github.com/XingyuQu/VF.

Problem

Research questions and friction points this paper is trying to address.

Feature Vanishing

Model Fusion

Pruning Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Preservation Fusion Method (PFM)

Feature Disappearance Problem

Pruning and Resizing Optimization

🔎 Similar Papers

No similar papers found.