🤖 AI Summary
Medical vision-language models (Med-VLMs) commonly suffer from poor confidence calibration, increasing clinical decision-making risks. To address this, we propose CalibPrompt—the first framework to integrate confidence calibration directly into the prompt learning process of Med-VLMs. Our method employs a dual-objective optimization: (i) a smooth accuracy–confidence alignment regularizer to improve calibration, and (ii) an angularly separated text feature loss to enhance multimodal discriminative consistency. CalibPrompt achieves end-to-end calibration using only a small number of labeled samples and learnable prompt parameters. Extensive experiments across four state-of-the-art Med-VLMs and five medical imaging datasets demonstrate that CalibPrompt significantly improves calibration metrics—reducing expected calibration error (ECE) by 38.2% on average—while preserving classification accuracy. This work establishes a lightweight, general-purpose calibration paradigm for trustworthy medical AI.
📝 Abstract
Medical Vision-Language Models (Med-VLMs) have demonstrated remarkable performance across diverse medical imaging tasks by leveraging large-scale image-text pretraining. However, their confidence calibration is largely unexplored, and so remains a significant challenge. As such, miscalibrated predictions can lead to overconfident errors, undermining clinical trust and decision-making reliability. To address this, we introduce CalibPrompt, the first framework to calibrate Med-VLMs during prompt tuning. CalibPrompt optimizes a small set of learnable prompts with carefully designed calibration objectives under scarce labeled data regime. First, we study a regularizer that attempts to align the smoothed accuracy with the predicted model confidences. Second, we introduce an angular separation loss to maximize textual feature proximity toward improving the reliability in confidence estimates of multimodal Med-VLMs. Extensive experiments on four publicly available Med-VLMs and five diverse medical imaging datasets reveal that CalibPrompt consistently improves calibration without drastically affecting clean accuracy. Our code is available at https://github.com/iabh1shekbasu/CalibPrompt.