π€ AI Summary
Medical vision-language models often produce miscalibrated uncertainty estimates under distribution shift, resulting in overly large prediction sets with imbalanced class coverage, while existing adaptation methods violate exchangeability and forfeit finite-sample coverage guarantees. To address this, this work proposes LATAβa training- and label-free, black-box conformal inference method that enhances prediction efficiency and class balance without compromising statistical validity. LATA applies Laplacian smoothing to zero-shot probabilities on an image k-NN graph and introduces a failure-aware conformal scoring function, preserving exchangeability and finite-sample coverage guarantees. Experiments across three medical VLMs and nine tasks demonstrate that LATA consistently yields smaller prediction sets, reduces coverage gaps, and matches or more tightly approaches the target coverage compared to existing transductive baselines, all while incurring substantially lower computational overhead than label-requiring methods.
π Abstract
Medical vision-language models (VLMs) are strong zero-shot recognizers for medical imaging, but their reliability under domain shift hinges on calibrated uncertainty with guarantees. Split conformal prediction (SCP) offers finite-sample coverage, yet prediction sets often become large (low efficiency) and class-wise coverage unbalanced-high class-conditioned coverage gap (CCV), especially in few-shot, imbalanced regimes; moreover, naively adapting to calibration labels breaks exchangeability and voids guarantees. We propose \texttt{\textbf{LATA}} (Laplacian-Assisted Transductive Adaptation), a \textit{training- and label-free} refinement that operates on the joint calibration and test pool by smoothing zero-shot probabilities over an image-image k-NN graph using a small number of CCCP mean-field updates, preserving SCP validity via a deterministic transform. We further introduce a \textit{failure-aware} conformal score that plugs into the vision-language uncertainty (ViLU) framework, providing instance-level difficulty and label plausibility to improve prediction set efficiency and class-wise balance at fixed coverage. \texttt{\textbf{LATA}} is black-box (no VLM updates), compute-light (windowed transduction, no backprop), and includes an optional prior knob that can run strictly label-free or, if desired, in a label-informed variant using calibration marginals once. Across \textbf{three} medical VLMs and \textbf{nine} downstream tasks, \texttt{\textbf{LATA}} consistently reduces set size and CCV while matching or tightening target coverage, outperforming prior transductive baselines and narrowing the gap to label-using methods, while using far less compute. Comprehensive ablations and qualitative analyses show that \texttt{\textbf{LATA}} sharpens zero-shot predictions without compromising exchangeability.