Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

📅 2024-07-24
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Test-time forgetting in parameter-efficient fine-tuning for continual learning (PEFT-CL) remains a critical challenge. This work formalizes forgetting as a quantifiable generalization gap grounded in Neural Tangent Kernel (NTK) theory, identifying sample size, task-level feature orthogonality, and regularization strength as key determinants. Method: We propose NTK-CL—a unified framework that eliminates task-specific parameter storage and instead enables task-aware representation via adaptive feature generation. It introduces two novel components: (i) an NTK-guided exponential moving average mechanism for inter-task knowledge consolidation, and (ii) a task orthogonality constraint to suppress intra-task generalization drift. Contribution/Results: Evaluated on standard PEFT-CL benchmarks, NTK-CL achieves state-of-the-art performance. Theoretical analysis and empirical results demonstrate a ~67% reduction in generalization gap and a threefold increase in effective feature representation dimensionality, substantially mitigating catastrophic forgetting.

Technology Category

Application Category

📝 Abstract
Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual performance in this paradigm remains elusive. To unravel this mystery, we undertake a rigorous analysis of PEFT-CL dynamics to derive relevant metrics for continual scenarios using Neural Tangent Kernel (NTK) theory. With the aid of NTK as a mathematical analysis tool, we recast the challenge of test-time forgetting into the quantifiable generalization gaps during training, identifying three key factors that influence these gaps and the performance of PEFT-CL: training sample size, task-level feature orthogonality, and regularization. To address these challenges, we introduce NTK-CL, a novel framework that eliminates task-specific parameter storage while adaptively generating task-relevant features. Aligning with theoretical guidance, NTK-CL triples the feature representation of each sample, theoretically and empirically reducing the magnitude of both task-interplay and task-specific generalization gaps. Grounded in NTK analysis, our framework imposes an adaptive exponential moving average mechanism and constraints on task-level feature orthogonality, maintaining intra-task NTK forms while attenuating inter-task NTK forms. Ultimately, by fine-tuning optimizable parameters with appropriate regularization, NTK-CL achieves state-of-the-art performance on established PEFT-CL benchmarks. This work provides a theoretical foundation for understanding and improving PEFT-CL models, offering insights into the interplay between feature representation, task orthogonality, and generalization, contributing to the development of more efficient continual learning systems.
Problem

Research questions and friction points this paper is trying to address.

Understanding mechanisms behind continual learning performance in PEFT-CL
Quantifying test-time forgetting via NTK-based generalization gaps
Improving PEFT-CL efficiency by addressing feature orthogonality and regularization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Neural Tangent Kernel for analysis
Introduces NTK-CL for adaptive feature generation
Applies adaptive exponential moving average mechanism
Jingren Liu
Jingren Liu
PhD student, Tianjin University
Continual LearningLong-form Video UnderstandingUnified Models
Zhong Ji
Zhong Ji
tianjin university
multimedia understandingcross-modal learningzero/few-shot learning
Y
Yunlong Yu
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310027, China
J
Jiale Cao
School of Electrical and Information Engineering, Tianjin Key Laboratory of Brain-Inspired Intelligence Technology, Tianjin University, Tianjin 300072, China, and also with the Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
Yanwei Pang
Yanwei Pang
Tianjin University
Computer VisionImage ProcessingPattern RecognitionMachine Learning
Jungong Han
Jungong Han
Chair Professor in Computer Vision, University of Sheffield, UK, FIAPR, FAAIA
Computer VisionVideo AnalyticsMachine Learning
X
Xuelong Li
Institute of Artificial Intelligence (TeleAI), China Telecom Corp Ltd, 31 Jinrong Street, Beijing 100033, P. R. China