Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models

📅 2025-07-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical vision-language models (MedVLMs) suffer from inherent probabilistic uncertainty, often generating unverified and erroneous responses that compromise clinical reliability. Existing mitigation strategies rely on costly fine-tuning and struggle to achieve deep alignment with domain-specific clinical knowledge. To address this, we propose a **fine-tuning-free expert-cooperative control framework**: first, uncertainty estimation identifies unreliable model outputs; second, external medical knowledge retrieval is integrated with expert-annotated key information highlighting; third, classifier-free guidance dynamically modulates token-level semantic representations, enabling uncertainty-driven, closed-loop expert refinement. Evaluated on three medical visual question answering benchmarks, our approach—using only a 4.2B-parameter model and minimal expert annotations—outperforms state-of-the-art 13B-parameter models, significantly enhancing clinical consistency and feasibility for resource-constrained deployment.

Technology Category

Application Category

📝 Abstract
The rapid advancements in Vision Language Models (VLMs) have prompted the development of multi-modal medical assistant systems. Despite this progress, current models still have inherent probabilistic uncertainties, often producing erroneous or unverified responses-an issue with serious implications in medical applications. Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning. However, these training-dependent strategies are costly and still lack sufficient alignment with clinical expertise. To address these issues, we propose an expert-in-the-loop framework named Expert-Controlled Classifier-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training. This framework introduces an uncertainty estimation strategy to identify unreliable outputs. It then retrieves relevant references to assist experts in highlighting key terms and applies classifier-free guidance to refine the token embeddings of MedVLM, ensuring that the adjusted outputs are correct and align with expert highlights. Evaluations across three medical visual question answering benchmarks demonstrate that the proposed Expert-CFG, with 4.2B parameters and limited expert annotations, outperforms state-of-the-art models with 13B parameters. The results demonstrate the feasibility of deploying such a system in resource-limited settings for clinical use.
Problem

Research questions and friction points this paper is trying to address.

Address probabilistic uncertainties in Medical Vision Language Models
Align model outputs with clinical expertise without additional training
Enhance reliability of medical responses using expert-guided framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert-CFG aligns MedVLM without extra training
Uncertainty estimation identifies unreliable outputs
Classifier-free guidance refines token embeddings
X
Xiao Liang
The Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xidian University, China
D
Di Wang
The Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xidian University, China
Zhicheng Jiao
Zhicheng Jiao
Brown University Health, Warren Alpert Medical School of Brown University
Medical image analysisHealth informatics
Ronghan Li
Ronghan Li
Xidian University
Natural language processingMachine Reading ComprehensionDialogue System
Pengfei Yang
Pengfei Yang
Institute of Software, Chinese Academy of Sciences
Probabilistic model checkingDNN verification
Q
Quan Wang
The Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xidian University, China
T
Tat-Seng Chua
National University of Singapore, Singapore