Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Large language models (LLMs) suffer from catastrophic forgetting in continual learning: fine-tuning on new tasks without access to historical data leads to substantial performance degradation on previously learned tasks. To address this, we propose Dynamic Orthogonal Continual Learning (DOCL), the first method to identify functional direction drift—as opposed to parameter-space drift—as the primary cause of regularization failure in existing approaches. DOCL introduces a dynamic functional direction tracking mechanism coupled with orthogonal gradient constraints, explicitly suppressing representational interference between old and new tasks during parameter updates. Crucially, it requires no historical data storage or replay. Instead, it achieves efficient, lightweight intervention via gradient orthogonalization and adaptive update scheduling. Evaluated on standard LLM continual learning benchmarks—including CLUE and sequential SuperGLUE—DOCL consistently outperforms state-of-the-art methods, reducing average forgetting by 32% and significantly improving generalization stability. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Catastrophic forgetting remains a critical challenge in continual learning for large language models (LLMs), where models struggle to retain performance on historical tasks when fine-tuning on new sequential data without access to past datasets. In this paper, we first reveal that the drift of functional directions during the fine-tuning process is a key reason why existing regularization-based methods fail in long-term LLM continual learning. To address this, we propose Dynamic Orthogonal Continual (DOC) fine-tuning, a novel approach that tracks the drift of these functional directions and dynamically updates them during the fine-tuning process. Furthermore, by adjusting the gradients of new task parameters to be orthogonal to the tracked historical function directions, our method mitigates interference between new and old tasks. Extensive experiments on various LLM continual learning benchmarks demonstrate that this approach outperforms prior methods, effectively reducing catastrophic forgetting and providing a robust tool for continuous LLM fine-tuning. Our code is available at https://github.com/meloxxxxxx/DOC.

Problem

Research questions and friction points this paper is trying to address.

Addressing catastrophic forgetting in large language models during continual learning

Mitigating functional direction drift that hinders long-term sequential fine-tuning

Reducing interference between new and old tasks through orthogonal gradient adjustment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tracks functional direction drift dynamically

Adjusts gradients to be orthogonal to history

Mitigates interference between new and old tasks

🔎 Similar Papers

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning