SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning

📅 2024-03-20

🏛️ International Conference on Learning Representations

📈 Citations: 15

✨ Influential: 2

career value

172K/year

🤖 AI Summary

This paper addresses Generalized Category Discovery (GCD): jointly clustering unlabeled images containing both seen (labeled) and unseen (unlabeled) categories, given only labeled examples from seen classes. Existing methods suffer from neglecting image spatial structure and poor generalization across categories. To address this, we propose Spatial Prompt Tuning (SPT), the first prompt-learning framework to explicitly embed local spatial priors—enabling category-agnostic, region-focused representation learning. Furthermore, we design a two-stage adaptive framework that jointly optimizes model parameters and data-aware parameters for robust self-adaptation. Evaluated on the SSB benchmark, our method achieves a mean accuracy of 61.4%, surpassing the prior state-of-the-art by approximately 10 percentage points, while introducing only 0.117% additional parameters. This demonstrates substantial improvements in both computational efficiency and cross-category generalization capability.

Technology Category

Application Category

📝 Abstract

Generalized Category Discovery (GCD) aims to classify unlabelled images from both `seen' and `unseen' classes by transferring knowledge from a set of labelled `seen' class images. A key theme in existing GCD approaches is adapting large-scale pre-trained models for the GCD task. An alternate perspective, however, is to adapt the data representation itself for better alignment with the pre-trained model. As such, in this paper, we introduce a two-stage adaptation approach termed SPTNet, which iteratively optimizes model parameters (i.e., model-finetuning) and data parameters (i.e., prompt learning). Furthermore, we propose a novel spatial prompt tuning method (SPT) which considers the spatial property of image data, enabling the method to better focus on object parts, which can transfer between seen and unseen classes. We thoroughly evaluate our SPTNet on standard benchmarks and demonstrate that our method outperforms existing GCD methods. Notably, we find our method achieves an average accuracy of 61.4% on the SSB, surpassing prior state-of-the-art methods by approximately 10%. The improvement is particularly remarkable as our method yields extra parameters amounting to only 0.117% of those in the backbone architecture. Project page: https://visual-ai.github.io/sptnet.

Problem

Research questions and friction points this paper is trying to address.

Classify unlabelled images from seen and unseen classes.

Optimize model and data parameters for better alignment.

Improve accuracy in Generalized Category Discovery tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage adaptation approach for GCD

Spatial prompt tuning for image data

Iterative optimization of model and data

🔎 Similar Papers

No similar papers found.