CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenge of catastrophic forgetting in large language models during continual learning. To mitigate this issue, the authors propose CRAFT, a novel framework that unifies task routing, regularization, and adaptation within a single KL divergence–based objective. CRAFT employs a low-rank intervention mechanism—a variant of LoRA—to enable controlled adaptation in the hidden representation space, clusters tasks via routing based on output distribution discrepancies, and regularizes group-wise prior states using KL divergence, all driven by the same optimization signal. Experimental results demonstrate that CRAFT consistently outperforms strong LoRA-based baselines across diverse benchmarks and model scales, effectively alleviating catastrophic forgetting, enhancing overall performance, and exhibiting robustness to task ordering.

📝 Abstract

Large language models (LLMs) can acquire new capabilities through fine-tuning, but continual adaptation often leads to catastrophic forgetting. We propose CRAFT, a continual learning framework that avoids updating model weights by instead learning low-rank interventions on hidden representations. CRAFT proceeds in three stages: it first routes each task to a group of similar tasks based on output-distribution divergence; it then fine-tunes the model using a Kullback-Leibler (KL) divergence against the group's prior state, which directly controls forgetting and determines convergence; finally, it merges interventions for the updated task into the shared representation using the same KL signal. This design unifies routing, regularization, and merging through a single KL-based objective. CRAFT improves overall performance and reduces forgetting compared to strong LoRA-based approaches across multiple benchmarks and model scales, while remaining robust to task ordering. These results suggest that controlling adaptation in representation space, guided by output-space divergence, provides a scalable and principled approach to continual learning in LLMs.

Problem

Research questions and friction points this paper is trying to address.

continual learning

catastrophic forgetting

large language models

adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual learning

low-rank intervention

catastrophic forgetting

KL divergence