Cultivating Multimodal Intelligence: Interpretive Reasoning and Agentic RAG Approaches to Dermatological Diagnosis

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In tele-dermatology, patients provide only skin images and textual symptom descriptions, necessitating asynchronous, high-confidence diagnostic reasoning. Method: We propose an interpretable reasoning framework for closed-set multimodal visual question answering (VQA) in dermatology. It integrates outputs from open-source multimodal models—including Qwen, Gemma, and LLaMA—via a structured reasoning layer, augmented by a dermatology knowledge base–driven agentic retrieval-augmented generation (RAG) mechanism that emulates clinicians’ systematic diagnostic workflow. A fine-grained answer coordination strategy further enhances response consistency. Contribution/Results: Our framework ranked second in the ImageCLEF MEDIQA-MAGIC challenge, demonstrating state-of-the-art accuracy, strong interpretability, and practical clinical deployability for asynchronous dermatological diagnosis.

Technology Category

Application Category

📝 Abstract
The second edition of the 2025 ImageCLEF MEDIQA-MAGIC challenge, co-organized by researchers from Microsoft, Stanford University, and the Hospital Clinic of Barcelona, focuses on multimodal dermatology question answering and segmentation, using real-world patient queries and images. This work addresses the Closed Visual Question Answering (CVQA) task, where the goal is to select the correct answer to multiple-choice clinical questions based on both user-submitted images and accompanying symptom descriptions. The proposed approach combines three core components: (1) fine-tuning open-source multimodal models from the Qwen, Gemma, and LLaMA families on the competition dataset, (2) introducing a structured reasoning layer that reconciles and adjudicates between candidate model outputs, and (3) incorporating agentic retrieval-augmented generation (agentic RAG), which adds relevant information from the American Academy of Dermatology's symptom and condition database to fill in gaps in patient context. The team achieved second place with a submission that scored sixth, demonstrating competitive performance and high accuracy. Beyond competitive benchmarks, this research addresses a practical challenge in telemedicine: diagnostic decisions must often be made asynchronously, with limited input and with high accuracy and interpretability. By emulating the systematic reasoning patterns employed by dermatologists when evaluating skin conditions, this architecture provided a pathway toward more reliable automated diagnostic support systems.
Problem

Research questions and friction points this paper is trying to address.

Develops multimodal AI for dermatology diagnosis using images and text
Combines fine-tuned models, reasoning layers, and agentic RAG for accuracy
Addresses telemedicine challenges with interpretable, asynchronous diagnostic decisions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned multimodal models for dermatology diagnosis
Structured reasoning layer for model output adjudication
Agentic RAG integrates dermatology database information
🔎 Similar Papers
No similar papers found.