Unlocking Compositional Generalization in Continual Few-Shot Learning

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
Existing continual few-shot learning methods struggle with compositional generalization due to overly holistic representations or entanglement with previously seen patterns. This work proposes a novel paradigm that decouples representation learning from compositional reasoning: during training, slot representations preserving object-level structure are optimized at the patch level via a self-supervised Vision Transformer, while the backbone network is frozen to prevent representational drift; during inference, these slots are dynamically recombined to adapt to entirely new scenes. By strictly separating representation learning from compositional inference for the first time, the method achieves strong transferability to novel concepts through only lightweight global optimization. It substantially improves generalization to unseen concepts on standard continual learning benchmarks while exhibiting minimal forgetting.
📝 Abstract
Object-centric representations promise a key property for few-shot learning: Rather than treating a scene as a single unit, a model can decompose it into individual object-level parts that can be matched and compared across different concepts. In practice, this potential is rarely realized. Continual learners either collapse scenes into global embeddings, or train with part-level matching objectives that tie representations too closely to seen patterns, leaving them unable to generalize to truly novel concepts. In this paper, we identify this fundamental structural conflict and pioneer a new paradigm that strictly decouples representation learning from compositional inference. Leveraging the inherent patch-level semantic geometry of self-supervised Vision Transformers (ViTs), our framework employs a dual-phase strategy. During training, slot representations are optimized entirely toward holistic class identity, preserving highly generalizable, object-level geometries. At inference, preserved slots are dynamically composed to match novel scenes. We demonstrate that this paradigm offers dual structural benefits: The frozen backbone naturally prevents representation drift, while our lightweight, holistic optimization preserves the features' capacity for novel-concept transfer. Extensive experiments validate this approach, achieving state-of-the-art unseen-concept generalization and minimal forgetting across standard continual learning benchmarks.
Problem

Research questions and friction points this paper is trying to address.

compositional generalization
continual few-shot learning
object-centric representations
novel concept transfer
representation drift
Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional generalization
continual few-shot learning
object-centric representation
Vision Transformers
representation decoupling
🔎 Similar Papers
No similar papers found.
P
Phu-Quy Nguyen-Lam
Faculty of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
P
Phu-Hoa Pham
Faculty of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
D
Dao Sy Duy Minh
Faculty of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
C
Chi-Nguyen Tran
Faculty of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
H
Huynh Trung Kiet
Faculty of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
Long Tran-Thanh
Long Tran-Thanh
Professor in Computer Science, University of Warwick
Artificial IntelligenceAI for social goodgame theoryhuman-agent learningmulti-armed bandits