HAM: Hierarchical Adapter Merging for Scalable Continual Learning

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Catastrophic forgetting severely hampers deep models in continual learning, and existing parameter-efficient fine-tuning methods (e.g., LoRA) suffer from poor scalability and cross-task knowledge interference due to adapter redundancy in long task sequences. To address this, we propose a hierarchical adapter fusion framework that integrates dynamic similarity-based clustering, importance-aware scaling, and sparse pruning—enabling selective knowledge transfer and efficient cross-task fusion. Our method constructs a scalable hierarchical structure atop low-rank adapters, substantially reducing both parameter overhead and inference cost. Evaluated on three vision-based continual learning benchmarks, our approach consistently achieves state-of-the-art performance as the number of tasks increases, demonstrating superior robustness and scalability. This work establishes a novel paradigm for large-scale continual learning.

Technology Category

Application Category

📝 Abstract
Continual learning is an essential capability of human cognition, yet it poses significant challenges for current deep learning models. The primary issue is that new knowledge can interfere with previously learned information, causing the model to forget earlier knowledge in favor of the new, a phenomenon known as catastrophic forgetting. Although large pre-trained models can partially mitigate forgetting by leveraging their existing knowledge and over-parameterization, they often struggle when confronted with novel data distributions. Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, enable efficient adaptation to new knowledge. However, they still face challenges in scaling to dynamic learning scenarios and long sequences of tasks, as maintaining one adapter per task introduces complexity and increases the potential for interference. In this paper, we introduce Hierarchical Adapters Merging (HAM), a novel framework that dynamically combines adapters from different tasks during training. This approach enables HAM to scale effectively, allowing it to manage more tasks than competing baselines with improved efficiency. To achieve this, HAM maintains a fixed set of groups that hierarchically consolidate new adapters. For each task, HAM trains a low-rank adapter along with an importance scalar, then dynamically groups tasks based on adapter similarity. Within each group, adapters are pruned, scaled and merge, facilitating transfer learning between related tasks. Extensive experiments on three vision benchmarks show that HAM significantly outperforms state-of-the-art methods, particularly as the number of tasks increases.
Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic forgetting in continual learning
Scales parameter-efficient fine-tuning for dynamic tasks
Dynamically merges adapters to reduce interference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Adapters Merging for dynamic combination
Fixed groups consolidate new adapters hierarchically
Tasks grouped by similarity for efficient transfer
🔎 Similar Papers
No similar papers found.