Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning

📅 2025-04-09

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To mitigate catastrophic forgetting in continual learning of large language models (LLMs), this paper proposes a constraint-based full-parameter fine-tuning method grounded in adaptive singular value decomposition (SVD). The approach introduces no additional parameters and requires no access to historical gradients; instead, it dynamically identifies task-relevant subspaces and enforces orthogonality constraints to geometrically isolate knowledge representations across tasks. Crucially, it is the first work to integrate adaptive SVD directly into the full-parameter fine-tuning paradigm, enabling joint optimization of knowledge retention and new-task adaptation. Experiments on T5-Large and LLaMA-2 7B demonstrate that our method achieves up to a 7% average accuracy gain over baselines such as O-LoRA, reduces forgetting rates to near zero, and preserves instruction-following capability, logical reasoning performance, and safety alignment.

Technology Category

Application Category

📝 Abstract

Continual learning in large language models (LLMs) is prone to catastrophic forgetting, where adapting to new tasks significantly degrades performance on previously learned ones. Existing methods typically rely on low-rank, parameter-efficient updates that limit the model's expressivity and introduce additional parameters per task, leading to scalability issues. To address these limitations, we propose a novel continual full fine-tuning approach leveraging adaptive singular value decomposition (SVD). Our method dynamically identifies task-specific low-rank parameter subspaces and constrains updates to be orthogonal to critical directions associated with prior tasks, thus effectively minimizing interference without additional parameter overhead or storing previous task gradients. We evaluate our approach extensively on standard continual learning benchmarks using both encoder-decoder (T5-Large) and decoder-only (LLaMA-2 7B) models, spanning diverse tasks including classification, generation, and reasoning. Empirically, our method achieves state-of-the-art results, up to 7% higher average accuracy than recent baselines like O-LoRA, and notably maintains the model's general linguistic capabilities, instruction-following accuracy, and safety throughout the continual learning process by reducing forgetting to near-negligible levels. Our adaptive SVD framework effectively balances model plasticity and knowledge retention, providing a practical, theoretically grounded, and computationally scalable solution for continual learning scenarios in large language models.

Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic forgetting in continual learning for LLMs

Overcomes scalability issues of parameter-efficient update methods

Balances model plasticity and knowledge retention via adaptive SVD

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive SVD for dynamic subspace identification

Orthogonal updates to minimize task interference

Full fine-tuning without additional parameter overhead

🔎 Similar Papers

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning