🤖 AI Summary
This work addresses online fine-grained category discovery, tackling the real-time recognition of known and unknown classes in streaming data—particularly under scarce annotation, where existing methods suffer from poor generalization. We propose the first diffusion-based multi-stage framework: (1) leveraging latent-space cross-image interpolation and attribute composition to synthesize diverse, semantically meaningful samples; (2) introducing a diversity-driven filtering mechanism to enhance synthetic sample quality; and (3) incorporating semi-supervised leader encoding to inject synthetic knowledge into feature learning. Our approach innovatively integrates diffusion models into online category discovery, enabling the first joint modeling of controllable attribute-aware generation and synthetic-knowledge-guided representation learning. Extensive experiments on six fine-grained benchmarks demonstrate significant improvements over state-of-the-art methods, validating superior recognition accuracy and robustness for both known and previously unseen categories under streaming and low-label regimes.
📝 Abstract
In this paper, we investigate a practical yet challenging task: On-the-fly Category Discovery (OCD). This task focuses on the online identification of newly arriving stream data that may belong to both known and unknown categories, utilizing the category knowledge from only labeled data. Existing OCD methods are devoted to fully mining transferable knowledge from only labeled data. However, the transferability learned by these methods is limited because the knowledge contained in known categories is often insufficient, especially when few annotated data/categories are available in fine-grained recognition. To mitigate this limitation, we propose a diffusion-based OCD framework, dubbed DiffGRE, which integrates Generation, Refinement, and Encoding in a multi-stage fashion. Specifically, we first design an attribute-composition generation method based on cross-image interpolation in the diffusion latent space to synthesize novel samples. Then, we propose a diversity-driven refinement approach to select the synthesized images that differ from known categories for subsequent OCD model training. Finally, we leverage a semi-supervised leader encoding to inject additional category knowledge contained in synthesized data into the OCD models, which can benefit the discovery of both known and unknown categories during the on-the-fly inference process. Extensive experiments demonstrate the superiority of our DiffGRE over previous methods on six fine-grained datasets.