Sparsity Hurts: Simple Linear Adapter Can Boost Generalized Category Discovery

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work addresses the performance bottlenecks in generalized category discovery (GCD) caused by insufficient fine-tuning flexibility and overfitting of visual prompts in existing methods. To overcome these limitations, the authors propose an efficient adaptation scheme based on Vision Transformers (ViTs), which embeds residual linear adapters into each ViT block and incorporates an auxiliary distribution alignment loss to enhance joint recognition of both known and novel classes. The study reveals that nonlinear adapters degrade performance due to feature sparsity, whereas purely linear structures yield superior results. Extensive experiments demonstrate that the proposed method significantly outperforms current state-of-the-art baselines on both generic and fine-grained datasets, confirming its effectiveness and robustness.
📝 Abstract
Generalized Category Discovery (GCD) seeks to identify novel categories from unlabeled data while retaining the classification ability of seen categories. Prior GCD methods commonly leverage transferable representations from pre-trained models, adapting to downstream datasets via partial fine-tuning (updating only the final ViT block) and visual prompt tuning (appending learnable vectors to inputs). However, conventional partial fine-tuning offers limited flexibility, as it fails to adapt the entire model; meanwhile, visual prompt tuning is prone to overfitting, due to its sensitivity to initialization and inherently constrained capacity. To address these limitations, we propose LAGCD, a simple yet effective GCD approach that embeds a residual linear adapter into each ViT block. From the perspective of feature sparsity, we systematically show that non-linearity in conventional adapters impairs performance, whereas our linear adapter enhances it by enabling more flexible model capacity. We further introduce an auxiliary distribution alignment loss to mitigate the negative impact of biased predictions between seen and novel categories. Extensive experiments on both generic and fine-grained datasets confirm that LAGCD consistently improves performance over many sophisticated baselines. The source code is available at https://github.com/yebo0216best/LAGCD
Problem

Research questions and friction points this paper is trying to address.

Generalized Category Discovery
partial fine-tuning
visual prompt tuning
model adaptation
overfitting
Innovation

Methods, ideas, or system contributions that make the work stand out.

linear adapter
Generalized Category Discovery
feature sparsity
distribution alignment
Vision Transformer
🔎 Similar Papers
2024-03-20International Conference on Learning RepresentationsCitations: 15
B
Bo Ye
School of Computer Science and Engineering, Southeast University, Nanjing 210096, China, and the Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China
K
Kai Gan
School of Computer Science and Engineering, Southeast University, Nanjing 210096, China, and the Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China
Tong Wei
Tong Wei
Southeast University
Machine Learning
Min-Ling Zhang
Min-Ling Zhang
Professor, School of Computer Science and Engineering, Southeast University, China
Artificial IntelligenceMachine LearningData Mining