Freeze and Cluster: A Simple Baseline for Rehearsal-Free Continual Category Discovery

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of Replay-Free Continual Category Discovery (RF-CCD): continuously identifying novel categories without storing historical data or fine-tuning a frozen backbone network. To overcome limitations of existing methods—such as training from scratch or relying on replay—which induce catastrophic forgetting, we propose a minimalist paradigm: freezing the pretrained backbone, applying k-means clustering on unlabeled data to generate pseudo-labels, and updating only the classifier head. This work is the first to systematically integrate continual learning with category discovery, and it reveals a critical degradation phenomenon in pretrained representations when exposed to unlabeled data streams. Extensive experiments on Stanford Cars, CUB, iNaturalist, and Tiny-ImageNet demonstrate state-of-the-art performance, validating the method’s effectiveness, generalizability, and computational efficiency. Our approach establishes a strong baseline for RF-CCD.

Technology Category

Application Category

📝 Abstract
This paper addresses the problem of Rehearsal-Free Continual Category Discovery (RF-CCD), which focuses on continuously identifying novel class by leveraging knowledge from labeled data. Existing methods typically train from scratch, overlooking the potential of base models, and often resort to data storage to prevent forgetting. Moreover, because RF-CCD encompasses both continual learning and novel class discovery, previous approaches have struggled to effectively integrate advanced techniques from these fields, resulting in less convincing comparisons and failing to reveal the unique challenges posed by RF-CCD. To address these challenges, we lead the way in integrating advancements from both domains and conducting extensive experiments and analyses. Our findings demonstrate that this integration can achieve state-of-the-art results, leading to the conclusion that in the presence of pre-trained models, the representation does not improve and may even degrade with the introduction of unlabeled data. To mitigate representation degradation, we propose a straightforward yet highly effective baseline method. This method first utilizes prior knowledge of known categories to estimate the number of novel classes. It then acquires representations using a model specifically trained on the base classes, generates high-quality pseudo-labels through k-means clustering, and trains only the classifier layer. We validate our conclusions and methods by conducting extensive experiments across multiple benchmarks, including the Stanford Cars, CUB, iNat, and Tiny-ImageNet datasets. The results clearly illustrate our findings, demonstrate the effectiveness of our baseline, and pave the way for future advancements in RF-CCD.
Problem

Research questions and friction points this paper is trying to address.

Addresses Rehearsal-Free Continual Category Discovery challenges.
Integrates continual learning and novel class discovery techniques.
Proposes a baseline method to prevent representation degradation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pre-trained base models for representation
Uses k-means clustering for pseudo-label generation
Trains only classifier layer to prevent degradation
🔎 Similar Papers
No similar papers found.
Chuyu Zhang
Chuyu Zhang
ShanghaiTech University
Computer VisionMachine Learning
Xueyang Yu
Xueyang Yu
UMass Amherst
Computer VisionMultimodal
P
Peiyan Gu
ShanghaiTech University, Shanghai, China
X
Xuming He
ShanghaiTech University, Shanghai, China; Shanghai Engineering Research Center of Intelligent Vision and Imaging, Shanghai, China