🤖 AI Summary
This work addresses the performance bottlenecks in generalized category discovery (GCD) caused by insufficient fine-tuning flexibility and overfitting of visual prompts in existing methods. To overcome these limitations, the authors propose an efficient adaptation scheme based on Vision Transformers (ViTs), which embeds residual linear adapters into each ViT block and incorporates an auxiliary distribution alignment loss to enhance joint recognition of both known and novel classes. The study reveals that nonlinear adapters degrade performance due to feature sparsity, whereas purely linear structures yield superior results. Extensive experiments demonstrate that the proposed method significantly outperforms current state-of-the-art baselines on both generic and fine-grained datasets, confirming its effectiveness and robustness.
📝 Abstract
Generalized Category Discovery (GCD) seeks to identify novel categories from unlabeled data while retaining the classification ability of seen categories. Prior GCD methods commonly leverage transferable representations from pre-trained models, adapting to downstream datasets via partial fine-tuning (updating only the final ViT block) and visual prompt tuning (appending learnable vectors to inputs). However, conventional partial fine-tuning offers limited flexibility, as it fails to adapt the entire model; meanwhile, visual prompt tuning is prone to overfitting, due to its sensitivity to initialization and inherently constrained capacity. To address these limitations, we propose LAGCD, a simple yet effective GCD approach that embeds a residual linear adapter into each ViT block. From the perspective of feature sparsity, we systematically show that non-linearity in conventional adapters impairs performance, whereas our linear adapter enhances it by enabling more flexible model capacity. We further introduce an auxiliary distribution alignment loss to mitigate the negative impact of biased predictions between seen and novel categories. Extensive experiments on both generic and fine-grained datasets confirm that LAGCD consistently improves performance over many sophisticated baselines. The source code is available at https://github.com/yebo0216best/LAGCD