Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation

📅 2025-01-26

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address catastrophic forgetting, degraded out-of-distribution (OOD) generalization, and high computational overhead in large-model domain adaptation, this paper proposes a parameter-efficient fine-tuning method based on selective activation of LoRA modules. Our core innovation is a learnable binary gating function that enables fine-grained, task-aware sparsity in LoRA updates, integrated within the Task Adaptive Parameter Sharing (TAPS) framework and low-rank decomposition. The method updates only ~5% of parameters. Evaluated on CLIP and DINO-ViT, it reduces trainable parameters by over 95% compared to standard LoRA, maintains or improves OOD accuracy, and significantly mitigates forgetting of prior-task knowledge. To our knowledge, this is the first work within the parameter-efficient fine-tuning (PEFT) paradigm to systematically enhance both OOD robustness and long-term knowledge retention.

Technology Category

Application Category

📝 Abstract

Adapting deep learning models to new domains often requires computationally intensive retraining and risks catastrophic forgetting. While fine-tuning enables domain-specific adaptation, it can reduce robustness to distribution shifts, impacting out-of-distribution (OOD) performance. Pre-trained zero-shot models like CLIP offer strong generalization but may suffer degraded robustness after fine-tuning. Building on Task Adaptive Parameter Sharing (TAPS), we propose a simple yet effective extension as a parameter-efficient fine-tuning (PEFT) method, using an indicator function to selectively activate Low-Rank Adaptation (LoRA) blocks. Our approach minimizes knowledge loss, retains its generalization strengths under domain shifts, and significantly reduces computational costs compared to traditional fine-tuning. We demonstrate that effective fine-tuning can be achieved with as few as 5% of active blocks, substantially improving efficiency. Evaluations on pre-trained models such as CLIP and DINO-ViT demonstrate our method's broad applicability and effectiveness in maintaining performance and knowledge retention.

Problem

Research questions and friction points this paper is trying to address.

Continual Learning

Knowledge Retention

Resource Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

TAPS-based Fine-tuning

LoRA Selective Activation

Efficient Adaptation

🔎 Similar Papers

An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates