Generalized Fine-Grained Category Discovery with Multi-Granularity Conceptual Experts

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generalized Category Discovery (GCD) aims to cluster unlabeled data containing both known and unknown categories by leveraging partial supervision from known classes, yet suffers from two key limitations: insufficient multi-granularity semantic modeling and reliance on pre-specified numbers of unknown categories. To address these, we propose a Multi-Granularity Concept Expert (MGCE) framework that jointly performs dynamic concept contrastive learning and multi-expert collaborative optimization. MGCE adaptively discovers visual concepts and estimates the number of unknown categories without requiring prior knowledge of this count. Our approach integrates dual-level representation learning, a concept alignment matrix, and a multi-granularity semantic fusion mechanism to enable precise discrimination between known and unknown classes. Evaluated on nine fine-grained benchmarks, MGCE achieves state-of-the-art performance, improving novel-class discovery accuracy by an average of 3.6% over parameterized methods that require pre-specifying the number of unknown classes.

Technology Category

Application Category

📝 Abstract
Generalized Category Discovery (GCD) is an open-world problem that clusters unlabeled data by leveraging knowledge from partially labeled categories. A key challenge is that unlabeled data may contain both known and novel categories. Existing approaches suffer from two main limitations. First, they fail to exploit multi-granularity conceptual information in visual data, which limits representation quality. Second, most assume that the number of unlabeled categories is known during training, which is impractical in real-world scenarios. To address these issues, we propose a Multi-Granularity Conceptual Experts (MGCE) framework that adaptively mines visual concepts and integrates multi-granularity knowledge for accurate category discovery. MGCE consists of two modules: (1) Dynamic Conceptual Contrastive Learning (DCCL), which alternates between concept mining and dual-level representation learning to jointly optimize feature learning and category discovery; and (2) Multi-Granularity Experts Collaborative Learning (MECL), which extends the single-expert paradigm by introducing additional experts at different granularities and by employing a concept alignment matrix for effective cross-expert collaboration. Importantly, MGCE can automatically estimate the number of categories in unlabeled data, making it suitable for practical open-world settings. Extensive experiments on nine fine-grained visual recognition benchmarks demonstrate that MGCE achieves state-of-the-art results, particularly in novel-class accuracy. Notably, even without prior knowledge of category numbers, MGCE outperforms parametric approaches that require knowing the exact number of categories, with an average improvement of 3.6%. Code is available at https://github.com/HaiyangZheng/MGCE.
Problem

Research questions and friction points this paper is trying to address.

Clustering unlabeled data containing both known and novel categories
Exploiting multi-granularity conceptual information in visual data
Automatically estimating category numbers without prior knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-granularity experts mine visual concepts adaptively
Dynamic contrastive learning jointly optimizes features and categories
Automatically estimates category numbers without prior knowledge
🔎 Similar Papers
No similar papers found.
H
Haiyang Zheng
Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
Nan Pu
Nan Pu
University of Trento, Leiden University
Deep Learning
W
Wenjing Li
School of Computer Science and Information Engineering, Hefei University of Technology, China
Nicu Sebe
Nicu Sebe
University of Trento
computer visionmultimedia
Zhun Zhong
Zhun Zhong
Hefei University of Technology & University of Nottingham
Computer Vision