BayesAdapter: enhanced uncertainty estimation in CLIP few-shot adaptation

📅 2024-12-12

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

To address the insufficient uncertainty estimation and poor confidence–accuracy calibration of vision-language models (e.g., CLIP) in few-shot adaptation, this paper proposes the first Bayesian inference framework for CLIP adapters. Unlike conventional point-estimate adapters, our method models adapter parameters as posterior distributions and employs variational inference to construct a lightweight Bayesian neural network, enabling systematic quantification of predictive uncertainty. The key contribution is the first integration of Bayesian reasoning into CLIP adapter design—extending beyond maximum-a-posteriori (MAP) estimation to full posterior modeling. Experiments across multiple few-shot benchmarks demonstrate significant improvements in uncertainty calibration (reduced Expected Calibration Error, ECE) and selective classification performance (increased Area Under the ROC Curve, AUROC). Our approach consistently outperforms existing parameter-efficient adapters in both calibration error and selective accuracy.

Technology Category

Application Category

📝 Abstract

The emergence of large pre-trained vision-language models (VLMs) represents a paradigm shift in machine learning, with unprecedented results in a broad span of visual recognition tasks. CLIP, one of the most popular VLMs, has exhibited remarkable zero-shot and transfer learning capabilities in classification. To transfer CLIP to downstream tasks, adapters constitute a parameter-efficient approach that avoids backpropagation through the large model (unlike related prompt learning methods). However, CLIP adapters have been developed to target discriminative performance, and the quality of their uncertainty estimates has been overlooked. In this work we show that the discriminative performance of state-of-the-art CLIP adapters does not always correlate with their uncertainty estimation capabilities, which are essential for a safe deployment in real-world scenarios. We also demonstrate that one of such adapters is obtained through MAP inference from a more general probabilistic framework. Based on this observation we introduce BayesAdapter, which leverages Bayesian inference to estimate a full probability distribution instead of a single point, better capturing the variability inherent in the parameter space. In a comprehensive empirical evaluation we show that our approach obtains high quality uncertainty estimates in the predictions, standing out in calibration and selective classification. Our code will be publicly available upon acceptance of the paper.

Problem

Research questions and friction points this paper is trying to address.

Uncertainty Estimation

Vision-Language Models

Few-shot Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

BayesAdapter

Uncertainty Estimation

Visual Language Models

🔎 Similar Papers

Fine-Tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia