🤖 AI Summary
Addressing two key challenges in active learning for large-scale vision-language models (VLMs)—inaccurate uncertainty estimation and low sampling efficiency—this paper proposes a parameter-efficient, differentiable uncertainty calibration framework. Methodologically, we introduce a novel uncertainty calibration loss function and jointly optimize prompt tuning and LoRA to enable lightweight, end-to-end uncertainty modeling—eliminating reliance on hand-crafted features and enabling high-confidence sample selection with only a few labeled examples. Extensive experiments across multiple vision-language benchmarks and diverse backbone architectures demonstrate that our approach significantly outperforms existing active learning strategies under limited annotation budgets: it maintains or improves accuracy while reducing training overhead by 30–50%. To the best of our knowledge, this is the first work to achieve high-accuracy, low-cost, fully differentiable, end-to-end active learning for large-scale VLMs.
📝 Abstract
Active Learning (AL) has emerged as a powerful approach for minimizing labeling costs by selectively sampling the most informative data for neural network model development. Effective AL for large-scale vision-language models necessitates addressing challenges in uncertainty estimation and efficient sampling given the vast number of parameters involved. In this work, we introduce a novel parameter-efficient learning methodology that incorporates uncertainty calibration loss within the AL framework. We propose a differentiable loss function that promotes uncertainty calibration for effectively selecting fewer and most informative data samples for fine-tuning. Through extensive experiments across several datasets and vision backbones, we demonstrate that our solution can match and exceed the performance of complex feature-based sampling techniques while being computationally very efficient. Additionally, we investigate the efficacy of Prompt learning versus Low-rank adaptation (LoRA) in sample selection, providing a detailed comparative analysis of these methods in the context of efficient AL.