Calibration-Aware Prompt Learning for Medical Vision-Language Models

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Medical vision-language models (Med-VLMs) commonly suffer from poor confidence calibration, increasing clinical decision-making risks. To address this, we propose CalibPrompt—the first framework to integrate confidence calibration directly into the prompt learning process of Med-VLMs. Our method employs a dual-objective optimization: (i) a smooth accuracy–confidence alignment regularizer to improve calibration, and (ii) an angularly separated text feature loss to enhance multimodal discriminative consistency. CalibPrompt achieves end-to-end calibration using only a small number of labeled samples and learnable prompt parameters. Extensive experiments across four state-of-the-art Med-VLMs and five medical imaging datasets demonstrate that CalibPrompt significantly improves calibration metrics—reducing expected calibration error (ECE) by 38.2% on average—while preserving classification accuracy. This work establishes a lightweight, general-purpose calibration paradigm for trustworthy medical AI.

Technology Category

Application Category

📝 Abstract

Medical Vision-Language Models (Med-VLMs) have demonstrated remarkable performance across diverse medical imaging tasks by leveraging large-scale image-text pretraining. However, their confidence calibration is largely unexplored, and so remains a significant challenge. As such, miscalibrated predictions can lead to overconfident errors, undermining clinical trust and decision-making reliability. To address this, we introduce CalibPrompt, the first framework to calibrate Med-VLMs during prompt tuning. CalibPrompt optimizes a small set of learnable prompts with carefully designed calibration objectives under scarce labeled data regime. First, we study a regularizer that attempts to align the smoothed accuracy with the predicted model confidences. Second, we introduce an angular separation loss to maximize textual feature proximity toward improving the reliability in confidence estimates of multimodal Med-VLMs. Extensive experiments on four publicly available Med-VLMs and five diverse medical imaging datasets reveal that CalibPrompt consistently improves calibration without drastically affecting clean accuracy. Our code is available at https://github.com/iabh1shekbasu/CalibPrompt.

Problem

Research questions and friction points this paper is trying to address.

Addresses confidence calibration in medical vision-language models

Reduces overconfident errors in clinical decision-making

Optimizes calibration-aware prompts under scarce labeled data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Calibration-aware prompt learning framework

Regularizer aligning smoothed accuracy with confidence

Angular separation loss for feature proximity

🔎 Similar Papers

No similar papers found.