Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the open-set challenge in generalized category discovery (GCD), where only a subset of known classes is labeled and unknown categories must be identified. To tackle this problem, we propose the SSR²-GCD framework, which introduces semi-supervised rate reduction into GCD for the first time. By optimizing multimodal representation learning, our method enhances intra-modal alignment to construct structured feature distributions and leverages the prompt candidate mechanism of vision-language models (VLMs) to strengthen cross-modal knowledge transfer. Extensive experiments on both generic and fine-grained benchmark datasets demonstrate that SSR²-GCD significantly outperforms existing approaches, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Generalized Category Discovery (GCD) aims to identify both known and unknown categories, with only partial labels given for the known categories, posing a challenging open-set recognition problem. State-of-the-art approaches for GCD task are usually built on multi-modality representation learning, which is heavily dependent upon inter-modality alignment. However, few of them cast a proper intra-modality alignment to generate a desired underlying structure of representation distributions. In this paper, we propose a novel and effective multi-modal representation learning framework for GCD via Semi-Supervised Rate Reduction, called SSR$^2$-GCD, to learn cross-modality representations with desired structural properties based on emphasizing to properly align intra-modality relationships. Moreover, to boost knowledge transfer, we integrate prompt candidates by leveraging the inter-modal alignment offered by Vision Language Models. We conduct extensive experiments on generic and fine-grained benchmark datasets demonstrating superior performance of our approach.
Problem

Research questions and friction points this paper is trying to address.

Generalized Category Discovery
open-set recognition
multi-modal representation learning
semi-supervised learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-Supervised Rate Reduction
Intra-Modality Alignment
Multi-Modal Representation Learning
Generalized Category Discovery
Prompt-based Knowledge Transfer
🔎 Similar Papers
No similar papers found.
Wei He
Wei He
Beijing University of Posts and Telecommunication
X
Xianghan Meng
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
Z
Zhiyuan Huang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
Xianbiao Qi
Xianbiao Qi
Shenzhen Intellifusion Technologies Co., Ltd.
Neural Network OptimizationGenerative ModelsLarge-Scale Pretrain ModelsOCR
R
Rong Xiao
Intellifusion Inc., Shenzhen, P.R. China
Chun-Guang Li
Chun-Guang Li
Associate Professor, Beijing University of Posts and Telecommunications
Subspace ClusteringSelf-Supervised LearningTime Series ModelingBiomedical Engineering