CLARIFY: A Specialist-Generalist Framework for Accurate and Lightweight Dermatological Visual Question Answering

📅 2025-08-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low diagnostic accuracy and high computational overhead of general-purpose vision-language models (VLMs) in dermatological visual question answering, this paper proposes an expert-generalist collaborative framework. It employs a lightweight domain-specific image classifier to dynamically guide the inference path of a compressed VLM, while integrating medical knowledge graph retrieval to enhance factual consistency and interpretability. Our key innovation lies in a hierarchical multimodal collaboration mechanism that enables semantic constraints and inference focusing—where the specialist model guides the generalist model’s reasoning. Evaluated on a newly constructed dermatological multimodal dataset, our method achieves an 18% improvement in diagnostic accuracy over the strongest baseline, reduces GPU memory consumption by over 20%, and cuts end-to-end latency by 5%, thereby jointly optimizing accuracy, efficiency, and clinical trustworthiness.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) have shown significant potential for medical tasks; however, their general-purpose nature can limit specialized diagnostic accuracy, and their large size poses substantial inference costs for real-world clinical deployment. To address these challenges, we introduce CLARIFY, a Specialist-Generalist framework for dermatological visual question answering (VQA). CLARIFY combines two components: (i) a lightweight, domain-trained image classifier (the Specialist) that provides fast and highly accurate diagnostic predictions, and (ii) a powerful yet compressed conversational VLM (the Generalist) that generates natural language explanations to user queries. In our framework, the Specialist's predictions directly guide the Generalist's reasoning, focusing it on the correct diagnostic path. This synergy is further enhanced by a knowledge graph-based retrieval module, which grounds the Generalist's responses in factual dermatological knowledge, ensuring both accuracy and reliability. This hierarchical design not only reduces diagnostic errors but also significantly improves computational efficiency. Experiments on our curated multimodal dermatology dataset demonstrate that CLARIFY achieves an 18% improvement in diagnostic accuracy over the strongest baseline, a fine-tuned, uncompressed single-line VLM, while reducing the average VRAM requirement and latency by at least 20% and 5%, respectively. These results indicate that a Specialist-Generalist system provides a practical and powerful paradigm for building lightweight, trustworthy, and clinically viable AI systems.
Problem

Research questions and friction points this paper is trying to address.

Improving diagnostic accuracy in dermatological visual question answering
Reducing computational costs for clinical deployment of AI
Enhancing reliability with knowledge-grounded explanations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Specialist-Generalist framework for dermatological VQA
Domain-trained image classifier with diagnostic predictions
Knowledge graph-based retrieval for factual grounding
🔎 Similar Papers
No similar papers found.