VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
Existing vision-based methods for Novel Class Discovery (NCD) in unlabeled data suffer from insufficient discriminability and poor robustness to long-tailed class distributions. Method: This paper proposes the first multimodal NCD framework integrating vision–text semantic priors: (1) joint modeling of image and text features to construct semantic prototypes and cluster centers; (2) a two-stage, semantic-affinity-driven adaptive clustering mechanism—novel in NCD—that achieves strong robustness to long-tailed distributions; and (3) prototype-guided clustering, semantic-affinity thresholding, and dynamic sample separation to enhance decoupling between known and unknown classes. Results: On CIFAR-100, our method achieves up to 25.3% higher unknown-class identification accuracy than state-of-the-art approaches, significantly mitigating performance degradation induced by long-tailed class imbalance.

Technology Category

Application Category

📝 Abstract
Novel Class Discovery aims to utilise prior knowledge of known classes to classify and discover unknown classes from unlabelled data. Existing NCD methods for images primarily rely on visual features, which suffer from limitations such as insufficient feature discriminability and the long-tail distribution of data. We propose LLM-NCD, a multimodal framework that breaks this bottleneck by fusing visual-textual semantics and prototype guided clustering. Our key innovation lies in modelling cluster centres and semantic prototypes of known classes by jointly optimising known class image and text features, and a dualphase discovery mechanism that dynamically separates known or novel samples via semantic affinity thresholds and adaptive clustering. Experiments on the CIFAR-100 dataset show that compared to the current methods, this method achieves up to 25.3% improvement in accuracy for unknown classes. Notably, our method shows unique resilience to long tail distributions, a first in NCD literature.
Problem

Research questions and friction points this paper is trying to address.

Discovers unknown classes using known class knowledge
Overcomes visual feature limitations in image classification
Improves accuracy for novel classes in long-tail distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses visual-textual semantics for enhanced discriminability
Uses prototype guided clustering to model known class centers
Implements dual-phase discovery with adaptive semantic thresholds
🔎 Similar Papers
No similar papers found.
Y
Yuetong Su
School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China
B
Baoguo Wei
School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China
X
Xinyu Wang
School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China
X
Xu Li
School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China
Lixin Li
Lixin Li
Georgia Southern University
Medical ImagingDatabasesSpatiotemoral Interpolation