Low-Complexity Inference in Continual Learning via Compressed Knowledge Transfer

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the dual challenges of high inference overhead and catastrophic forgetting in class-incremental learning (CIL), this paper proposes a lightweight co-compression framework integrating phased pruning (pre-training + post-training) with downstream-aware knowledge distillation. It is the first to achieve efficient, low-complexity, and forgetting-resistant inference under task-agnostic settings. The method reveals the complementary trade-off between pruning and distillation in balancing accuracy and efficiency, jointly optimizing stability and deployability. Evaluated on multiple CIL benchmarks, it significantly outperforms strong baselines: reducing inference FLOPs by 3–5× with <1.2% accuracy degradation, thereby establishing the state-of-the-art Pareto frontier between accuracy and latency.

Technology Category

Application Category

📝 Abstract

Continual learning (CL) aims to train models that can learn a sequence of tasks without forgetting previously acquired knowledge. A core challenge in CL is balancing stability -- preserving performance on old tasks -- and plasticity -- adapting to new ones. Recently, large pre-trained models have been widely adopted in CL for their ability to support both, offering strong generalization for new tasks and resilience against forgetting. However, their high computational cost at inference time limits their practicality in real-world applications, especially those requiring low latency or energy efficiency. To address this issue, we explore model compression techniques, including pruning and knowledge distillation (KD), and propose two efficient frameworks tailored for class-incremental learning (CIL), a challenging CL setting where task identities are unavailable during inference. The pruning-based framework includes pre- and post-pruning strategies that apply compression at different training stages. The KD-based framework adopts a teacher-student architecture, where a large pre-trained teacher transfers downstream-relevant knowledge to a compact student. Extensive experiments on multiple CIL benchmarks demonstrate that the proposed frameworks achieve a better trade-off between accuracy and inference complexity, consistently outperforming strong baselines. We further analyze the trade-offs between the two frameworks in terms of accuracy and efficiency, offering insights into their use across different scenarios.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost in continual learning inference

Balancing stability and plasticity in class-incremental learning

Applying model compression for efficient knowledge transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pruning-based framework for class-incremental learning

Knowledge distillation for compact student models

Balanced accuracy and inference complexity trade-off

🔎 Similar Papers

HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning