🤖 AI Summary
In exemplar-free continual learning (EFCL), pretrained models suffer from severe catastrophic forgetting exacerbated by dual imbalances in real-world data streams—inter-task distributional shift and intra-task long-tailed (or reverse-skewed) class distributions.
Method: We propose a region-aware, distribution-adaptive enhancement framework. First, we formally characterize and model this dual imbalance. Second, leveraging CLIP to localize semantic key regions, we perform cross-class patch transplantation to strengthen representations of few-shot classes. Third, we dynamically adjust sampling weights based on historical task distributions to achieve inter-task learning balance. The framework freezes the backbone, ensuring lightweight efficiency and compatibility with diverse pretrained models.
Results: Our method achieves significant accuracy gains across mainstream EFCL benchmarks, markedly mitigates forgetting, and demonstrates superior robustness and generalization—particularly under long-tailed scenarios.
📝 Abstract
Exemplar-Free Continual Learning (EFCL) restricts the storage of previous task data and is highly susceptible to catastrophic forgetting. While pre-trained models (PTMs) are increasingly leveraged for EFCL, existing methods often overlook the inherent imbalance of real-world data distributions. We discovered that real-world data streams commonly exhibit dual-level imbalances, dataset-level distributions combined with extreme or reversed skews within individual tasks, creating both intra-task and inter-task disparities that hinder effective learning and generalization. To address these challenges, we propose PANDA, a Patch-and-Distribution-Aware Augmentation framework that integrates seamlessly with existing PTM-based EFCL methods. PANDA amplifies low-frequency classes by using a CLIP encoder to identify representative regions and transplanting those into frequent-class samples within each task. Furthermore, PANDA incorporates an adaptive balancing strategy that leverages prior task distributions to smooth inter-task imbalances, reducing the overall gap between average samples across tasks and enabling fairer learning with frozen PTMs. Extensive experiments and ablation studies demonstrate PANDA's capability to work with existing PTM-based CL methods, improving accuracy and reducing catastrophic forgetting.