Optimizing Active Learning in Vision-Language Models via Parameter-Efficient Uncertainty Calibration

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Addressing two key challenges in active learning for large-scale vision-language models (VLMs)—inaccurate uncertainty estimation and low sampling efficiency—this paper proposes a parameter-efficient, differentiable uncertainty calibration framework. Methodologically, we introduce a novel uncertainty calibration loss function and jointly optimize prompt tuning and LoRA to enable lightweight, end-to-end uncertainty modeling—eliminating reliance on hand-crafted features and enabling high-confidence sample selection with only a few labeled examples. Extensive experiments across multiple vision-language benchmarks and diverse backbone architectures demonstrate that our approach significantly outperforms existing active learning strategies under limited annotation budgets: it maintains or improves accuracy while reducing training overhead by 30–50%. To the best of our knowledge, this is the first work to achieve high-accuracy, low-cost, fully differentiable, end-to-end active learning for large-scale VLMs.

Technology Category

Application Category

📝 Abstract

Active Learning (AL) has emerged as a powerful approach for minimizing labeling costs by selectively sampling the most informative data for neural network model development. Effective AL for large-scale vision-language models necessitates addressing challenges in uncertainty estimation and efficient sampling given the vast number of parameters involved. In this work, we introduce a novel parameter-efficient learning methodology that incorporates uncertainty calibration loss within the AL framework. We propose a differentiable loss function that promotes uncertainty calibration for effectively selecting fewer and most informative data samples for fine-tuning. Through extensive experiments across several datasets and vision backbones, we demonstrate that our solution can match and exceed the performance of complex feature-based sampling techniques while being computationally very efficient. Additionally, we investigate the efficacy of Prompt learning versus Low-rank adaptation (LoRA) in sample selection, providing a detailed comparative analysis of these methods in the context of efficient AL.

Problem

Research questions and friction points this paper is trying to address.

Optimizing active learning for vision-language models efficiently

Improving uncertainty calibration in large-scale model sampling

Comparing Prompt learning and LoRA for sample selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient uncertainty calibration loss

Differentiable loss for informative sampling

Prompt learning versus LoRA comparison

🔎 Similar Papers

Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model