SelfPrompt: Confidence-Aware Semi-Supervised Tuning for Robust Vision-Language Model Adaptation

📅 2025-01-24

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address performance degradation in vision-language models (VLMs) under semi-supervised learning—caused by cumulative pseudo-label noise and miscalibrated confidence—this paper proposes a cluster-guided pseudo-label generation and confidence-aware learning framework. It introduces the first prompt-tuning method integrating clustering analysis, confidence calibration, weakly supervised sampling, and active learning to jointly optimize fully and weakly supervised objectives. Key innovations include: (1) generating robust pseudo-labels grounded in feature-cluster structure; and (2) designing a confidence-aware active sampling strategy to maximize annotation budget efficiency. Extensive evaluation across 13 benchmark datasets demonstrates consistent state-of-the-art performance: +6.23% average gain in standard semi-supervised settings, +6.25% in active semi-supervised settings, and improved base-to-novel generalization—+4.9% and +11.78% under 2-shot and 1-shot scenarios, respectively.

Technology Category

Application Category

📝 Abstract

We present SelfPrompt, a novel prompt-tuning approach for vision-language models (VLMs) in a semi-supervised learning setup. Existing methods for tuning VLMs in semi-supervised setups struggle with the negative impact of the miscalibrated VLMs on pseudo-labelling, and the accumulation of noisy pseudo-labels. SelfPrompt addresses these challenges by introducing a cluster-guided pseudo-labelling method that improves pseudo-label accuracy, and a confidence-aware semi-supervised learning module that maximizes the utilization of unlabelled data by combining supervised learning and weakly-supervised learning. Additionally, we investigate our method in an active semi-supervised learning setup, where the labelled set is strategically selected to ensure the best utilization of a limited labelling budget. To this end, we propose a weakly-supervised sampling technique that selects a diverse and representative labelled set, which can be seamlessly integrated into existing methods to enhance their performance. We conduct extensive evaluations across 13 datasets, significantly surpassing state-of-the-art performances with average improvements of 6.23% in standard semi-supervised learning, 6.25% in active semi-supervised learning, and 4.9% in base-to-novel generalization, using a 2-shot setup. Furthermore, SelfPrompt shows excellent generalization in single-shot settings, achieving an average improvement of 11.78%.

Problem

Research questions and friction points this paper is trying to address.

Visual Language Models

Semi-supervised Learning

Labeling Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

SelfPrompt

semi-supervised learning

visual language models

🔎 Similar Papers

No similar papers found.