Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models

📅 2024-06-13

🏛️ CoLLAs

📈 Citations: 6

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work investigates the true performance-driving mechanisms underlying replay-free continual learning (RFCL) with pretrained models. Through systematic ablation and mechanistic attribution, we find that mainstream parameter-efficient fine-tuning–based RFCL (P-RFCL) methods—despite their claimed reliance on sophisticated query or prompting mechanisms—actually operate via implicit parameter constraints. Most degenerate to standard PEFT “shortcut” solutions. We further uncover an implicit upper bound on the number of tunable parameters, identifying this constraint as a critical commonality for effective RFCL performance. Guided by these insights, we propose a lightweight PEFT baseline that matches state-of-the-art P-RFCL methods across multiple RFCL benchmarks. The study clarifies the fundamental distinction between continual adaptation and single-task adaptation, and re-evaluates classical regularization methods (e.g., EWC, SI): though superseded in modern PEFT contexts, their core principle—explicit parameter constraint—remains foundational and highly relevant to RFCL.

Technology Category

Application Category

📝 Abstract

With the advent and recent ubiquity of foundation models, continual learning (CL) has recently shifted from continual training from scratch to the continual adaptation of pretrained models, seeing particular success on rehearsal-free CL benchmarks (RFCL). To achieve this, most proposed methods adapt and restructure parameter-efficient finetuning techniques (PEFT) to suit the continual nature of the problem. Based most often on input-conditional query-mechanisms or regularizations on top of prompt- or adapter-based PEFT, these PEFT-style RFCL (P-RFCL) approaches report peak performances; often convincingly outperforming existing CL techniques. However, on the other end, critical studies have recently highlighted competitive results by training on just the first task or via simple non-parametric baselines. Consequently, questions arise about the relationship between methodological choices in P-RFCL and their reported high benchmark scores. In this work, we tackle these questions to better understand the true drivers behind strong P-RFCL performances, their placement w.r.t. recent first-task adaptation studies, and their relation to preceding CL standards such as EWC or SI. In particular, we show: (1) P-RFCL techniques relying on input-conditional query mechanisms work not because, but rather despite them by collapsing towards standard PEFT shortcut solutions. (2) Indeed, we show how most often, P-RFCL techniques can be matched by a simple and lightweight PEFT baseline. (3) Using this baseline, we identify the implicit bound on tunable parameters when deriving RFCL approaches from PEFT methods as a potential denominator behind P-RFCL efficacy. Finally, we (4) better disentangle continual versus first-task adaptation, and (5) motivate standard RFCL techniques s.a. EWC or SI in light of recent P-RFCL methods.

Problem

Research questions and friction points this paper is trying to address.

Understanding true performance drivers in rehearsal-free continual learning with pretrained models

Evaluating relationship between methodological choices and reported benchmark scores in P-RFCL

Disentangling continual adaptation effectiveness versus simple first-task adaptation approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient finetuning techniques adapted for continual learning

Input-conditional query mechanisms and regularization methods

Lightweight PEFT baseline matching complex P-RFCL performance

🔎 Similar Papers

HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning