🤖 AI Summary
Test-time forgetting in parameter-efficient fine-tuning for continual learning (PEFT-CL) remains a critical challenge. This work formalizes forgetting as a quantifiable generalization gap grounded in Neural Tangent Kernel (NTK) theory, identifying sample size, task-level feature orthogonality, and regularization strength as key determinants.
Method: We propose NTK-CL—a unified framework that eliminates task-specific parameter storage and instead enables task-aware representation via adaptive feature generation. It introduces two novel components: (i) an NTK-guided exponential moving average mechanism for inter-task knowledge consolidation, and (ii) a task orthogonality constraint to suppress intra-task generalization drift.
Contribution/Results: Evaluated on standard PEFT-CL benchmarks, NTK-CL achieves state-of-the-art performance. Theoretical analysis and empirical results demonstrate a ~67% reduction in generalization gap and a threefold increase in effective feature representation dimensionality, substantially mitigating catastrophic forgetting.
📝 Abstract
Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual performance in this paradigm remains elusive. To unravel this mystery, we undertake a rigorous analysis of PEFT-CL dynamics to derive relevant metrics for continual scenarios using Neural Tangent Kernel (NTK) theory. With the aid of NTK as a mathematical analysis tool, we recast the challenge of test-time forgetting into the quantifiable generalization gaps during training, identifying three key factors that influence these gaps and the performance of PEFT-CL: training sample size, task-level feature orthogonality, and regularization. To address these challenges, we introduce NTK-CL, a novel framework that eliminates task-specific parameter storage while adaptively generating task-relevant features. Aligning with theoretical guidance, NTK-CL triples the feature representation of each sample, theoretically and empirically reducing the magnitude of both task-interplay and task-specific generalization gaps. Grounded in NTK analysis, our framework imposes an adaptive exponential moving average mechanism and constraints on task-level feature orthogonality, maintaining intra-task NTK forms while attenuating inter-task NTK forms. Ultimately, by fine-tuning optimizable parameters with appropriate regularization, NTK-CL achieves state-of-the-art performance on established PEFT-CL benchmarks. This work provides a theoretical foundation for understanding and improving PEFT-CL models, offering insights into the interplay between feature representation, task orthogonality, and generalization, contributing to the development of more efficient continual learning systems.