CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Current continual learning (CL) evaluation faces two critical bottlenecks: (1) performance saturation on static benchmarks, limiting their ability to reflect real-world task dynamics; and (2) insufficient consideration of pretraining data contamination risks. To address these, we propose CLDyB—the first algorithm-aware dynamic benchmarking framework. CLDyB models task evolution as a Markov decision process and employs Monte Carlo tree search to automatically discover the most challenging task sequences for a given CL method. It supports both joint and personalized evaluation modes, generating reproducible, generalizable, and adversarially difficult sequences that significantly enhance assessment rigor and credibility. Extensive experiments expose fundamental failure boundaries of mainstream CL approaches under realistic dynamics. We publicly release all code and generated task sequences to foster standardized, trustworthy CL evaluation.

Technology Category

Application Category

📝 Abstract

The advent of the foundation model era has sparked significant research interest in leveraging pre-trained representations for continual learning (CL), yielding a series of top-performing CL methods on standard evaluation benchmarks. Nonetheless, there are growing concerns regarding potential data contamination during the pre-training stage. Furthermore, standard evaluation benchmarks, which are typically static, fail to capture the complexities of real-world CL scenarios, resulting in saturated performance. To address these issues, we describe CL on dynamic benchmarks (CLDyB), a general computational framework based on Markov decision processes for evaluating CL methods reliably. CLDyB dynamically identifies inherently difficult and algorithm-dependent tasks for the given CL methods, and determines challenging task orders using Monte Carlo tree search. Leveraging CLDyB, we first conduct a joint evaluation of multiple state-of-the-art CL methods, leading to a set of commonly challenging and generalizable task sequences where existing CL methods tend to perform poorly. We then conduct separate evaluations of individual CL methods using CLDyB, discovering their respective strengths and weaknesses. The source code and generated task sequences are publicly accessible at https://github.com/szc12153/CLDyB.

Problem

Research questions and friction points this paper is trying to address.

Addresses data contamination in pre-trained models for continual learning.

Introduces dynamic benchmarks to evaluate real-world continual learning complexities.

Identifies challenging task sequences where existing methods underperform.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic benchmarking using Markov decision processes

Monte Carlo tree search for task ordering

Evaluation of continual learning methods on challenging tasks

🔎 Similar Papers

HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning