Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration

📅 2025-05-11

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Current general-purpose medical AI achieves expert-level performance on biomedical perception tasks but suffers from two key clinical limitations: weak multimodal interpretability and inadequate prognostic modeling capability. To address these, we propose XMedGPT—a clinician-oriented, multimodal interpretable AI assistant. Our method introduces a novel consistency-driven reliability index mechanism that establishes a tripartite interpretability loop integrating diagnostic outputs, visual attribution maps, and prognostic predictions. It further incorporates anatomy-aware visual grounding, interactive uncertainty quantification, multi-task joint pretraining, and survival-analysis-integrated prognostic modeling. Experiments demonstrate that XMedGPT achieves an anatomical localization IoU of 0.703, outperforms state-of-the-art prognostic models by 26.9% in concordance, attains an uncertainty estimation AUC of 0.862, and improves cross-modal generalization across 40 imaging modalities by 20.7%.

Technology Category

Application Category

📝 Abstract

Generalist Medical AI (GMAI) systems have demonstrated expert-level performance in biomedical perception tasks, yet their clinical utility remains limited by inadequate multi-modal explainability and suboptimal prognostic capabilities. Here, we present XMedGPT, a clinician-centric, multi-modal AI assistant that integrates textual and visual interpretability to support transparent and trustworthy medical decision-making. XMedGPT not only produces accurate diagnostic and descriptive outputs, but also grounds referenced anatomical sites within medical images, bridging critical gaps in interpretability and enhancing clinician usability. To support real-world deployment, we introduce a reliability indexing mechanism that quantifies uncertainty through consistency-based assessment via interactive question-answering. We validate XMedGPT across four pillars: multi-modal interpretability, uncertainty quantification, and prognostic modeling, and rigorous benchmarking. The model achieves an IoU of 0.703 across 141 anatomical regions, and a Kendall's tau-b of 0.479, demonstrating strong alignment between visual rationales and clinical outcomes. For uncertainty estimation, it attains an AUC of 0.862 on visual question answering and 0.764 on radiology report generation. In survival and recurrence prediction for lung and glioma cancers, it surpasses prior leading models by 26.9%, and outperforms GPT-4o by 25.0%. Rigorous benchmarking across 347 datasets covers 40 imaging modalities and external validation spans 4 anatomical systems confirming exceptional generalizability, with performance gains surpassing existing GMAI by 20.7% for in-domain evaluation and 16.7% on 11,530 in-house data evaluation. Together, XMedGPT represents a significant leap forward in clinician-centric AI integration, offering trustworthy and scalable support for diverse healthcare applications.

Problem

Research questions and friction points this paper is trying to address.

Enhances multi-modal explainability in medical AI systems

Improves prognostic capabilities for clinical decision-making

Quantifies uncertainty to ensure trustworthy human-AI collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal AI integrates text and visual interpretability

Reliability indexing quantifies uncertainty via interactive QA

Surpasses prior models in prognostic accuracy by 26.9%

🔎 Similar Papers

CardioAI: A Multimodal AI-based System to Support Symptom Monitoring and Risk Detection of Cancer Treatment-Induced Cardiotoxicity