MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence

📅 2026-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical vision-language models struggle to translate general knowledge into clinically actionable outputs and lack effective support for lesion detection, symptom tracking, and interpretability. This work proposes MEDIC-AD, the first medical VLM that integrates anomaly awareness, temporal difference modeling, and visual explainability. It introduces learnable <Ano> and <Diff> tokens to explicitly attend to pathological regions and longitudinal changes, respectively, and employs a multi-stage training framework to generate reasoning-consistent saliency maps. Experiments on longitudinal clinical imaging data demonstrate that MEDIC-AD significantly outperforms both closed-source and medical-specific baselines in lesion detection, symptom progression tracking, and anomaly segmentation, while exhibiting reliable predictive performance and clinically credible interpretability within real-world hospital workflows.
📝 Abstract
Lesion detection, symptom tracking, and visual explainability are central to real-world medical image analysis, yet current medical Vision-Language Models (VLMs) still lack mechanisms that translate their broad knowledge into clinically actionable outputs. To bridge this gap, we present MEDIC-AD, a clinically oriented VLM that strengthens these three capabilities through a stage-wise framework. First, learnable anomaly-aware tokens (<Ano>) encourage the model to focus on abnormal regions and build more discriminative lesion centered representations. Second, inter image difference tokens (<Diff>) explicitly encode temporal changes between studies, allowing the model to distinguish worsening, improvement, and stability in disease burden. Finally, a dedicated explainability stage trains the model to generate heatmaps that highlight lesion-related regions, offering clear visual evidence that is consistent with the model's reasoning. Through our staged design, MEDIC-AD steadily boosts performance across anomaly detection, symptom tracking, and anomaly segmentation, achieving state-of-the-art results compared with both closed source and medical-specialized baselines. Evaluations on real longitudinal clinical data collected from real hospital workflows further show that MEDIC-AD delivers stable predictions and clinically faithful explanations in practical patient-monitoring and decision-support workflows
Problem

Research questions and friction points this paper is trying to address.

medical Vision-Language Models
lesion detection
symptom tracking
visual explainability
clinical intelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

anomaly-aware tokens
temporal difference modeling
visual explainability
medical vision-language model
lesion-centered representation
🔎 Similar Papers
No similar papers found.
W
Woohyeon Park
AIDAS Laboratory, Seoul National University
J
Jaeik Kim
AIDAS Laboratory, Seoul National University
S
Sunghwan Steve Cho
AIDAS Laboratory, Seoul National University
P
Pa Hong
Samsung Changwon Hospital
W
Wookyoung Jeong
Samsung Medical Center
Yoojin Nam
Yoojin Nam
Department of Radiology, Samsung Changwon Hospital
RadiologyMedical AI
N
Namjoon Kim
Samsung Changwon Hospital
Ginny Y. Wong
Ginny Y. Wong
Data Scientist, NVIDIA AI Technology Center
Ka Chun Cheung
Ka Chun Cheung
NVIDIA
Meshless methodpartial differential equationradial basis function
Jaeyoung Do
Jaeyoung Do
Department of Electrical and Computer Engineering, Seoul National University
Generative AI (LLMs)Multi-Modal AI (NLP/Vision)Big Data Systems