MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence

📅 2026-03-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

177K/year
🤖 AI Summary
Existing medical vision-language models struggle to translate general knowledge into clinically actionable outputs and lack effective support for lesion detection, symptom tracking, and interpretability. This work proposes MEDIC-AD, the first medical VLM that integrates anomaly awareness, temporal difference modeling, and visual explainability. It introduces learnable <Ano> and <Diff> tokens to explicitly attend to pathological regions and longitudinal changes, respectively, and employs a multi-stage training framework to generate reasoning-consistent saliency maps. Experiments on longitudinal clinical imaging data demonstrate that MEDIC-AD significantly outperforms both closed-source and medical-specific baselines in lesion detection, symptom progression tracking, and anomaly segmentation, while exhibiting reliable predictive performance and clinically credible interpretability within real-world hospital workflows.

Technology Category

Application Category

📝 Abstract
Lesion detection, symptom tracking, and visual explainability are central to real-world medical image analysis, yet current medical Vision-Language Models (VLMs) still lack mechanisms that translate their broad knowledge into clinically actionable outputs. To bridge this gap, we present MEDIC-AD, a clinically oriented VLM that strengthens these three capabilities through a stage-wise framework. First, learnable anomaly-aware tokens (<Ano>) encourage the model to focus on abnormal regions and build more discriminative lesion centered representations. Second, inter image difference tokens (<Diff>) explicitly encode temporal changes between studies, allowing the model to distinguish worsening, improvement, and stability in disease burden. Finally, a dedicated explainability stage trains the model to generate heatmaps that highlight lesion-related regions, offering clear visual evidence that is consistent with the model's reasoning. Through our staged design, MEDIC-AD steadily boosts performance across anomaly detection, symptom tracking, and anomaly segmentation, achieving state-of-the-art results compared with both closed source and medical-specialized baselines. Evaluations on real longitudinal clinical data collected from real hospital workflows further show that MEDIC-AD delivers stable predictions and clinically faithful explanations in practical patient-monitoring and decision-support workflows
Problem

Research questions and friction points this paper is trying to address.

medical Vision-Language Models
lesion detection
symptom tracking
visual explainability
clinical intelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

anomaly-aware tokens
temporal difference modeling
visual explainability
medical vision-language model
lesion-centered representation