ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

To address the challenge of balancing memory efficiency and classification performance in video continual incremental learning (VCIL), this paper proposes an efficient framework integrating episodic and semantic memory. Methodologically: (i) key-frame temporal features are sparsely sampled and stored in episodic memory; (ii) learnable, class-level prompt vectors are maintained in semantic memory; and (iii) a cross-attention memory retrieval module enables dual-memory–driven feature reconstruction and knowledge consolidation. This design effectively mitigates catastrophic forgetting while achieving both low memory overhead and high discriminability. Evaluated on the TCD and vCLIMB benchmarks, our method reduces memory consumption by 30%–50% compared to prior work, yet achieves average accuracy gains of 2.1–4.7 percentage points over state-of-the-art approaches, demonstrating its effectiveness and practicality.

Technology Category

Application Category

📝 Abstract

In this work, we tackle the problem of video classincremental learning (VCIL). Many existing VCIL methods mitigate catastrophic forgetting by rehearsal training with a few temporally dense samples stored in episodic memory, which is memory-inefficient. Alternatively, some methods store temporally sparse samples, sacrificing essential temporal information and thereby resulting in inferior performance. To address this trade-off between memory-efficiency and performance, we propose EpiSodic and SEmaNTIc memory integrAtion for video class-incremental Learning (ESSENTIAL). ESSENTIAL consists of episodic memory for storing temporally sparse features and semantic memory for storing general knowledge represented by learnable prompts. We introduce a novel memory retrieval (MR) module that integrates episodic memory and semantic prompts through cross-attention, enabling the retrieval of temporally dense features from temporally sparse features. We rigorously validate ESSENTIAL on diverse datasets: UCF-101, HMDB51, and Something-Something-V2 from the TCD benchmark and UCF-101, ActivityNet, and Kinetics-400 from the vCLIMB benchmark. Remarkably, with significantly reduced memory, ESSENTIAL achieves favorable performance on the benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Balancing memory-efficiency and performance in video class-incremental learning

Integrating episodic and semantic memory for better temporal feature retrieval

Reducing catastrophic forgetting in VCIL with sparse features and learnable prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates episodic and semantic memory for VCIL

Uses cross-attention for memory retrieval

Stores sparse features and learnable prompts

🔎 Similar Papers

Deep Common Feature Mining for Efficient Video Semantic Segmentation