🤖 AI Summary
This work addresses catastrophic forgetting and the high retraining cost in continual learning for large models by proposing Share, the first method to achieve strict continual learning without data replay or multiple adapters. Share constructs and dynamically updates a shared low-rank subspace, leveraging LoRA-based parameter-efficient fine-tuning, core knowledge extraction, and incremental direction fusion to enable forward knowledge transfer across tasks and modalities while mitigating interference. Compared to conventional LoRA, Share reduces the number of trainable parameters by 100× and memory consumption by 281×, while matching the performance of joint training. Its efficiency and scalability are validated across diverse tasks, including image classification, natural language understanding, 3D pose estimation, and text-to-image generation.
📝 Abstract
Adapting large pretrained models to new tasks efficiently and continually is crucial for real-world deployment but remains challenging due to catastrophic forgetting and the high cost of retraining. While parameter-efficient tuning methods like low rank adaptation (LoRA) reduce computational demands, they lack mechanisms for strict continual learning and knowledge integration, without relying on data replay, or multiple adapters. We propose Share, a novel approach to parameter efficient continual finetuning that learns and dynamically updates a single, shared low-rank subspace, enabling seamless adaptation across multiple tasks and modalities. Share constructs a foundational subspace that extracts core knowledge from past tasks and incrementally integrates new information by identifying essential subspace directions. Knowledge from each new task is incorporated into this evolving subspace, facilitating forward knowledge transfer, while minimizing catastrophic interference. This approach achieves up to 100x parameter reduction and 281x memory savings over traditional LoRA methods, maintaining performance comparable to jointly trained models. A single Share model can replace hundreds of task-specific LoRA adapters, supporting scalable, asynchronous continual learning. Experiments across image classification, natural language understanding, 3D pose estimation, and text-to-image generation validate its effectiveness, making Share a practical and scalable solution for lifelong learning in large-scale AI systems.