Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the trustworthiness challenges of vision-language models (VLMs) in histopathological image analysis, where unreliable outputs and lack of transparency hinder clinical adoption. We propose the first logit-level uncertainty quantification framework, which systematically evaluates model confidence by leveraging temperature-scaled output logits in conjunction with metrics including cosine similarity, Jensen–Shannon divergence, and Kullback–Leibler divergence. Experimental results reveal that general-purpose VLMs—such as VILA-M3-8B and LLaVA-Med—exhibit high sensitivity to prompt variations and temperature settings, leading to substantial stochasticity in their predictions. In contrast, the pathology-specialized model PRISM demonstrates near-deterministic behavior, underscoring both the efficacy of our framework and its clinical relevance for trustworthy deployment in medical imaging applications.

Technology Category

Application Category

📝 Abstract
Vision-Language Models (VLMs) with their multimodal capabilities have demonstrated remarkable success in almost all domains, including education, transportation, healthcare, energy, finance, law, and retail. Nevertheless, the utilization of VLMs in healthcare applications raises crucial concerns due to the sensitivity of large-scale medical data and the trustworthiness of these models (reliability, transparency, and security). This study proposes a logit-level uncertainty quantification (UQ) framework for histopathology image analysis using VLMs to deal with these concerns. UQ is evaluated for three VLMs using metrics derived from temperature-controlled output logits. The proposed framework demonstrates a critical separation in uncertainty behavior. While VLMs show high stochastic sensitivity (cosine similarity (CS) $<0.71$ and $<0.84$, Jensen-Shannon divergence (JS) $<0.57$ and $<0.38$, and Kullback-Leibler divergence (KL) $<0.55$ and $<0.35$, respectively for mean values of VILA-M3-8B and LLaVA-Med v1.5), near-maximal temperature impacts ($Δ_T \approx 1.00$), and displaying abrupt uncertainty transitions, particularly for complex diagnostic prompts. In contrast, the pathology-specific PRISM model maintains near-deterministic behavior (mean CS $>0.90$, JS $<0.10$, KL $<0.09$) and significantly minimal temperature effects across all prompt complexities. These findings emphasize the importance of logit-level uncertainty quantification to evaluate trustworthiness in histopathology applications utilizing VLMs.
Problem

Research questions and friction points this paper is trying to address.

Uncertainty Quantification
Vision-Language Models
Histopathology
Logit-level
Trustworthiness
Innovation

Methods, ideas, or system contributions that make the work stand out.

logit-level uncertainty quantification
vision-language models
histopathology image analysis
temperature-controlled logits
model trustworthiness
🔎 Similar Papers
No similar papers found.
B
Betul Yurdem
Department of Electrical and Electronics Engineering, Izmir Bakircay University, 35665 Izmir, Turkey
Ferhat Ozgur Catak
Ferhat Ozgur Catak
Assoc. Professor of Cyber Security at University of Stavanger
Trustworthy AICyber Security5G/6GData PrivacyCryptography
M
Murat Kuzlu
Batten College of Engineering and Technology, Old Dominion University, Norfolk, VA 23529, USA
M
Mehmet Kemal Gullu
Department of Electrical and Electronics Engineering, Izmir Bakircay University, 35665 Izmir, Turkey