Hierarchical Semantic Tree Anchoring for CLIP-Based Class-Incremental Learning

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Existing CLIP-based continual learning methods neglect the semantic hierarchy inherent in vision-language concepts, leading to fine-grained feature drift and catastrophic forgetting. To address this, we propose Hierarchical Semantic Tree Anchoring (HSTA): a novel framework that first leverages external knowledge graphs to explicitly construct hierarchical multimodal embeddings within a hyperspherical space, thereby modeling inter-class hypernymy–hyponymy relationships. We further enhance hierarchical structure preservation by adopting hyperbolic space representations and introduce a gradient orthogonal projection mechanism to mitigate interference from historical tasks. Evaluated on multiple standard benchmarks, HSTA achieves significant improvements over state-of-the-art methods. It simultaneously delivers strong generalization capability and a unified, interpretable, structured representation. Our work establishes a new paradigm for CLIP-driven continual learning, bridging semantic hierarchy, geometric representation, and optimization stability in a principled manner.

Technology Category

Application Category

📝 Abstract

Class-Incremental Learning (CIL) enables models to learn new classes continually while preserving past knowledge. Recently, vision-language models like CLIP offer transferable features via multi-modal pre-training, making them well-suited for CIL. However, real-world visual and linguistic concepts are inherently hierarchical: a textual concept like"dog"subsumes fine-grained categories such as"Labrador"and"Golden Retriever,"and each category entails its images. But existing CLIP-based CIL methods fail to explicitly capture this inherent hierarchy, leading to fine-grained class features drift during incremental updates and ultimately to catastrophic forgetting. To address this challenge, we propose HASTEN (Hierarchical Semantic Tree Anchoring) that anchors hierarchical information into CIL to reduce catastrophic forgetting. First, we employ an external knowledge graph as supervision to embed visual and textual features in hyperbolic space, effectively preserving hierarchical structure as data evolves. Second, to mitigate catastrophic forgetting, we project gradients onto the null space of the shared hyperbolic mapper, preventing interference with prior tasks. These two steps work synergistically to enable the model to resist forgetting by maintaining hierarchical relationships. Extensive experiments show that HASTEN consistently outperforms existing methods while providing a unified structured representation.

Problem

Research questions and friction points this paper is trying to address.

Captures hierarchical semantic relationships in class-incremental learning

Prevents fine-grained feature drift during incremental model updates

Reduces catastrophic forgetting through hyperbolic space embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding visual and textual features in hyperbolic space

Projecting gradients onto null space of mapper

Maintaining hierarchical relationships to resist forgetting

🔎 Similar Papers

AMR-RE: Abstract Meaning Representations for Retrieval-Based In-Context Learning in Relation Extraction