LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging

📅 2024-10-22

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address catastrophic forgetting in fine-tuning and performance degradation in multi-task model merging, this paper proposes Layer-wise Incremental Network Scaling (LiNeS), a post-training editing method. LiNeS introduces, for the first time, a layer-depth-aware linear parameter scaling mechanism—scaling update magnitudes proportionally to layer depth—to jointly preserve general-purpose representations and enhance task-specific learning. It requires no architectural modifications or changes to training pipelines and is natively compatible with RLHF alignment strategies. Empirically, LiNeS significantly mitigates forgetting in single-task fine-tuning and suppresses negative interference during multi-task model merging. Experiments across diverse vision and NLP benchmarks demonstrate improved out-of-distribution (OOD) generalization and consistent performance gains over mainstream merging methods—including Task Arithmetic and DARE—across models of varying scales and architectures. LiNeS is plug-and-play, computationally efficient, and highly scalable.

Technology Category

Application Category

📝 Abstract

Fine-tuning pre-trained models has become the standard approach to endow them with specialized knowledge, but it poses fundamental challenges. In particular, extit{(i)} fine-tuning often leads to catastrophic forgetting, where improvements on a target domain degrade generalization on other tasks, and extit{(ii)} merging fine-tuned checkpoints from disparate tasks can lead to significant performance loss. To address these challenges, we introduce LiNeS, Layer-increasing Network Scaling, a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance. LiNeS scales parameter updates linearly based on their layer depth within the network, maintaining shallow layers close to their pre-trained values to preserve general features while allowing deeper layers to retain task-specific representations. In multi-task model merging scenarios, layer-wise scaling of merged parameters reduces negative task interference. LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing. It mitigates forgetting, enhances out-of-distribution generalization, integrates seamlessly with existing multi-task model merging baselines improving their performance across benchmarks and model sizes, and can boost generalization when merging LLM policies aligned with different rewards via RLHF. Our method is simple to implement, computationally efficient and complementary to many existing techniques. Our source code is available at https://github.com/wang-kee/LiNeS

Problem

Research questions and friction points this paper is trying to address.

Mitigates catastrophic forgetting in fine-tuned models

Enhances performance in multi-task model merging

Preserves pre-trained generalization while improving task-specific performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-increasing Network Scaling technique

Linear parameter updates by layer depth

Reduces negative task interference in merging

🔎 Similar Papers

No similar papers found.