Balancing Speciality and Versatility: A Coarse to Fine Framework for Mitigating Catastrophic Forgetting in Large Language Models

📅 2024-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
How can catastrophic forgetting (CF) in domain-specific fine-tuning of large language models (LLMs) be mitigated while preserving both general-purpose capability and domain expertise? This paper proposes CoFiTune, a novel parameter-efficient fine-tuning (PEFT) framework featuring a two-tier adaptation mechanism: coarse-grained module selection coupled with fine-grained soft masking. It dynamically identifies critical modules via empirical tree search and applies learnable soft masks to enable sparse, progressive parameter updates. CoFiTune supports module-level freezing and adaptive activation. On a 13B model, it improves general-domain performance by ~14% over full-parameter fine-tuning while sustaining near-lossless domain-specific accuracy. Extensive cross-task and cross-scale evaluations demonstrate consistent superiority over state-of-the-art baselines. To our knowledge, CoFiTune is the first method to achieve synergistic enhancement and dynamic equilibrium between generalization and specialization in LLM adaptation.

Technology Category

Application Category

📝 Abstract
Aligned Large Language Models (LLMs) showcase remarkable versatility, capable of handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit speciality, excelling in specific applications. However, fine-tuning with extra data, a common practice to gain speciality, often leads to catastrophic forgetting (CF) of previously acquired versatility, hindering the model's performance across diverse tasks. In response to this challenge, we propose CoFiTune, a coarse to fine framework in an attempt to strike the balance between speciality and versatility. At the coarse-grained level, an empirical tree-search algorithm is utilized to pinpoint and update specific modules that are crucial for speciality, while keeping other parameters frozen; at the fine-grained level, a soft-masking mechanism regulates the update to the LLMs, mitigating the CF issue without harming speciality. In an overall evaluation of both speciality and versatility, CoFiTune consistently outperforms baseline methods across diverse tasks and model scales. Compared to the full-parameter SFT, CoFiTune leads to about 14% versatility improvement and marginal speciality loss on a 13B model. Lastly, based on further analysis, we provide a speculative insight into the information forwarding process in LLMs, which helps explain the effectiveness of the proposed method. The code is available at https://github.com/rattlesnakey/CoFiTune.
Problem

Research questions and friction points this paper is trying to address.

Mitigates catastrophic forgetting in LLMs
Balances speciality and versatility
Improves versatility without losing speciality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Coarse to fine framework
Empirical tree-search algorithm
Soft-masking mechanism
🔎 Similar Papers
No similar papers found.
Hengyuan Zhang
Hengyuan Zhang
Ph.D. Student, University of California San Diego
RoboticsComputer VisionAutonomous VehiclesSensor Fusion
Y
Yanru Wu
Tsinghua University
D
Dawei Li
University of California, San Diego
S
Sak Yang
Independent Researcher
R
Rui Zhao
SenseTime Research
Y
Yong Jiang
Tsinghua University
Fei Tan
Fei Tan
Associate Professor, East China Normal University
NLPData MiningNetwork Science