Explicit Uncertainty Modeling for Active CLIP Adaptation with Dual Prompt Tuning

πŸ“… 2026-02-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of efficiently adapting pre-trained vision-language models like CLIP under limited annotation budgets in active learning for image classification. The authors propose a dual-prompt tuning mechanism that introduces learnable positive and negative prompts into CLIP’s text encoder. The positive prompt enhances the discriminability of task-relevant textual embeddings, while the negative prompt enables principled uncertainty estimation by modeling the inverse probability of correct predictions, thereby guiding informative sample selection. This approach is the first to explicitly model uncertainty in active CLIP adaptation, moving beyond conventional implicit strategies based on entropy or clustering. Extensive experiments demonstrate that, under identical annotation budgets and across various fine-tuning paradigms, the proposed method significantly outperforms existing active learning strategies in both classification accuracy and sample selection efficiency.

Technology Category

Application Category

πŸ“ Abstract
Pre-trained vision-language models such as CLIP exhibit strong transferability, yet adapting them to downstream image classification tasks under limited annotation budgets remains challenging. In active learning settings, the model must select the most informative samples for annotation from a large pool of unlabeled data. Existing approaches typically estimate uncertainty via entropy-based criteria or representation clustering, without explicitly modeling uncertainty from the model perspective. In this work, we propose a robust uncertainty modeling framework for active CLIP adaptation based on dual-prompt tuning. We introduce two learnable prompts in the textual branch of CLIP. The positive prompt enhances the discriminability of task-specific textual embeddings corresponding to light-weight tuned visual embeddings, improving classification reliability. Meanwhile, the negative prompt is trained in an reversed manner to explicitly model the probability that the predicted label is correct, providing a principled uncertainty signal for guiding active sample selection. Extensive experiments across different fine-tuning paradigms demonstrate that our method consistently outperforms existing active learning methods under the same annotation budget.
Problem

Research questions and friction points this paper is trying to address.

active learning
uncertainty modeling
CLIP adaptation
image classification
limited annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-prompt tuning
explicit uncertainty modeling
active learning
CLIP adaptation
negative prompt
πŸ”Ž Similar Papers
No similar papers found.