🤖 AI Summary
To address the lack of statistically reliable predictions in few-shot transfer learning for medical vision-language models (VLMs), this paper introduces SCA-T—the first trustworthy classification framework tailored for medical VLMs—based on Split Conformal Prediction (SCP). SCA-T enables unsupervised, transductive self-adaptation by jointly calibrating calibration and test sets while preserving exchangeability, thereby ensuring both statistical validity and computational efficiency. We further propose a multimodal nonconformity scoring mechanism specifically designed for medical VLMs, integrated with task-aware fine-tuning strategies. Experiments across diverse medical imaging modalities and diagnostic tasks demonstrate that SCA-T significantly improves conformal set efficiency and conditional coverage, while strictly satisfying the $1-alpha$ empirical coverage guarantee. To our knowledge, this is the first method to provide both theoretical validity and practical applicability for trustworthy few-shot medical image classification.
📝 Abstract
Medical vision-language models (VLMs) have demonstrated unprecedented transfer capabilities and are being increasingly adopted for data-efficient image classification. Despite its growing popularity, its reliability aspect remains largely unexplored. This work explores the split conformal prediction (SCP) framework to provide trustworthiness guarantees when transferring such models based on a small labeled calibration set. Despite its potential, the generalist nature of the VLMs' pre-training could negatively affect the properties of the predicted conformal sets for specific tasks. While common practice in transfer learning for discriminative purposes involves an adaptation stage, we observe that deploying such a solution for conformal purposes is suboptimal since adapting the model using the available calibration data breaks the rigid exchangeability assumptions for test data in SCP. To address this issue, we propose transductive split conformal adaptation (SCA-T), a novel pipeline for transfer learning on conformal scenarios, which performs an unsupervised transductive adaptation jointly on calibration and test data. We present comprehensive experiments utilizing medical VLMs across various image modalities, transfer tasks, and non-conformity scores. Our framework offers consistent gains in efficiency and conditional coverage compared to SCP, maintaining the same empirical guarantees.