Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM Empowerment

📅 2024-09-14

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Clinical smartphone-captured skin images suffer from severe noise, scarce expert annotations, and highly variable, subtle lesions—leading to poor robustness and limited interpretability in AI-based diagnosis. Method: We propose a cross-attention fusion framework grounded in the Segment Anything Model (SAM), the first to adapt SAM for unstandardized clinical photography without requiring pixel-level annotations. Our approach leverages promptable segmentation to generate localized lesion-centric visual concepts, which are then aligned with global image features at multiple scales and fused via cross-attention for semantic-driven diagnostic reasoning. A novel visual concept discovery mechanism bridges prompt-based segmentation with clinical diagnostic logic. Results: Evaluated on two real-world dermatological datasets, our method significantly improves classification accuracy and decision interpretability over state-of-the-art methods, especially under low-quality imaging and weakly supervised conditions, enabling robust, clinically actionable auxiliary diagnosis.

Technology Category

Application Category

📝 Abstract

Current AI-assisted skin image diagnosis has achieved dermatologist-level performance in classifying skin cancer, driven by rapid advancements in deep learning architectures. However, unlike traditional vision tasks, skin images in general present unique challenges due to the limited availability of well-annotated datasets, complex variations in conditions, and the necessity for detailed interpretations to ensure patient safety. Previous segmentation methods have sought to reduce image noise and enhance diagnostic performance, but these techniques require fine-grained, pixel-level ground truth masks for training. In contrast, with the rise of foundation models, the Segment Anything Model (SAM) has been introduced to facilitate promptable segmentation, enabling the automation of the segmentation process with simple yet effective prompts. Efforts applying SAM predominantly focus on dermatoscopy images, which present more easily identifiable lesion boundaries than clinical photos taken with smartphones. This limitation constrains the practicality of these approaches to real-world applications. To overcome the challenges posed by noisy clinical photos acquired via non-standardized protocols and to improve diagnostic accessibility, we propose a novel Cross-Attentive Fusion framework for interpretable skin lesion diagnosis. Our method leverages SAM to generate visual concepts for skin diseases using prompts, integrating local visual concepts with global image features to enhance model performance. Extensive evaluation on two skin disease datasets demonstrates our proposed method's effectiveness on lesion diagnosis and interpretability.

Problem

Research questions and friction points this paper is trying to address.

Artificial Intelligence

Skin Disease Diagnosis

Mobile Photography

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Attentive Fusion

SAM Model

Mobile Phone Dermatology Images

🔎 Similar Papers

Visual Evaluative AI: A Hypothesis-Driven Tool with Concept-Based Explanations and Weight of Evidence