HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning

📅 2024-07-07
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Pre-trained models (PTMs) suffer from catastrophic forgetting and unstable performance in continual learning (CL) when freezing the backbone and applying prompt-based prompt-tuning (PET). Method: We propose HiDe-PET, the first framework to theoretically decompose the CL objective into three hierarchical, cooperative sub-goals: intra-task prediction, task identity inference, and task-adaptive prediction. It integrates parameter-efficient techniques—including prompt learning, adapters, and LoRA—with task ID embeddings, hierarchical attention gating, and progressive knowledge distillation to jointly model shared representations and task-specific knowledge while efficiently recovering pre-trained representations. Results: Evaluated across multiple CL benchmarks, HiDe-PET achieves a 5.2% average accuracy gain and reduces forgetting by 38% over prompt-based PET and state-of-the-art baselines, demonstrating the effectiveness and generalizability of hierarchical decomposition for continual prompt tuning.

Technology Category

Application Category

📝 Abstract
The deployment of pre-trained models (PTMs) has greatly advanced the field of continual learning (CL), enabling positive knowledge transfer and resilience to catastrophic forgetting. To sustain these advantages for sequentially arriving tasks, a promising direction involves keeping the pre-trained backbone frozen while employing parameter-efficient tuning (PET) techniques to instruct representation learning. Despite the popularity of Prompt-based PET for CL, its empirical design often leads to sub-optimal performance in our evaluation of different PTMs and target tasks. To this end, we propose a unified framework for CL with PTMs and PET that provides both theoretical and empirical advancements. We first perform an in-depth theoretical analysis of the CL objective in a pre-training context, decomposing it into hierarchical components namely within-task prediction, task-identity inference and task-adaptive prediction. We then present Hierarchical Decomposition PET (HiDe-PET), an innovative approach that explicitly optimizes the decomposed objective through incorporating task-specific and task-shared knowledge via mainstream PET techniques along with efficient recovery of pre-trained representations. Leveraging this framework, we delve into the distinct impacts of implementation strategy, PET technique and PET architecture, as well as adaptive knowledge accumulation amidst pronounced distribution changes. Finally, across various CL scenarios, our approach demonstrates remarkably superior performance over a broad spectrum of recent strong baselines.
Problem

Research questions and friction points this paper is trying to address.

Optimize continual learning with pre-trained models and parameter-efficient tuning
Address sub-optimal performance in prompt-based PET for CL
Decompose CL objectives into hierarchical components for better adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical decomposition of CL objective
Task-specific and shared knowledge optimization
Efficient recovery of pre-trained representations
🔎 Similar Papers
No similar papers found.
Liyuan Wang
Liyuan Wang
Tsinghua University
bio-inspired learningcontinual learningneuroscience
Jingyi Xie
Jingyi Xie
Assistant Professor, San José State University
Human-Computer InteractionAccessibilityHuman-Centered AI
X
Xingxing Zhang
Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint Center for ML, Tsinghua University, Beijing, China
H
Hang Su
Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint Center for ML, Tsinghua University, Beijing, China
J
Jun Zhu
Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint Center for ML, Tsinghua University, Beijing, China