AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the catastrophic forgetting in CLIP-based class-incremental learning, which arises from attribute extraction and aggregation shifts due to reliance solely on current-task data. To mitigate this issue, the authors propose a two-stage decoupled framework: first, principal geodesic analysis is employed to anchor class-specific attributes in the hyperspherical embedding space, stabilizing feature extraction; second, lightweight task-specific experts regularized by a variational information bottleneck are introduced, with inference routed via an optimal transport mechanism. This approach effectively alleviates forgetting and significantly outperforms state-of-the-art methods across multiple benchmarks, thereby enhancing the incremental learning capability of the base CLIP model.
📝 Abstract
Class-Incremental Learning (CIL) is important in building real-world learning systems. In CLIP-based CIL, the model performs classification by comparing similarity between visual and textual embeddings obtained from template prompts, e.g., ``a photo of a [CLASS]''. This seemingly monolithic matching process can be decomposed into two conceptually distinct stages: attribute extraction and attribute aggregation. For example, a model may recognize cat using attributes such as fur texture and whiskers. When learning a new class like car, the model must extract additional attributes like wheels and adjust how they are aggregated in the shared representation space. However, since only data from the current task is available, incremental updates can bias both attribute extraction and aggregation toward new classes, leading to catastrophic forgetting. Therefore, we propose AREA for attribute extraction and aggregation in CLIP-based CIL. To stabilize extraction, we anchor class-level visual and textual attributes on the hyperspherical embedding space via principal geodesic analysis. To stabilize aggregation, we learn lightweight task-specific experts with scoring and residual refinement, regularized by a variational information bottleneck objective. During inference, we perform routing over task attribute manifolds via optimal transport for more concise prediction. Experiments show that AREA consistently outperforms SOTA methods. Code is available at https://github.com/LAMDA-CL/ICML2026-AREA.
Problem

Research questions and friction points this paper is trying to address.

Class-Incremental Learning
CLIP
Catastrophic Forgetting
Attribute Extraction
Attribute Aggregation
Innovation

Methods, ideas, or system contributions that make the work stand out.

attribute extraction
attribute aggregation
class-incremental learning
CLIP
optimal transport