Towards Explainable Skin Cancer Classification: A Dual-Network Attention Model with Lesion Segmentation and Clinical Metadata Fusion

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Early skin cancer diagnosis faces challenges including high intra-class variability, subtle inter-class distinctions, and insufficient model interpretability. To address these, we propose a dual-encoder cross-attention fusion framework: (1) a Deep-UNet enhanced with dual attention gates and an Atrous Spatial Pyramid Pooling (ASPP) module for precise lesion segmentation; (2) a DenseNet201-based dual-stream encoder jointly processing dermoscopic images and clinical metadata, integrated via multi-head cross-attention and Transformer-based cross-modal modeling; and (3) Grad-CAM visualization for interpretability validation. The framework effectively suppresses background bias and directs attention to pathologically salient regions. Evaluated on HAM10000 and ISIC benchmarks, our method achieves state-of-the-art performance in both segmentation and classification tasks, with significant improvements in AUC (+3.2%) and accuracy (+2.8%), while ensuring strong discriminative capability and clinical credibility.

Technology Category

Application Category

📝 Abstract

Skin cancer is a life-threatening disease where early detection significantly improves patient outcomes. Automated diagnosis from dermoscopic images is challenging due to high intra-class variability and subtle inter-class differences. Many deep learning models operate as "black boxes," limiting clinical trust. In this work, we propose a dual-encoder attention-based framework that leverages both segmented lesions and clinical metadata to enhance skin lesion classification in terms of both accuracy and interpretability. A novel Deep-UNet architecture with Dual Attention Gates (DAG) and Atrous Spatial Pyramid Pooling (ASPP) is first employed to segment lesions. The classification stage uses two DenseNet201 encoders-one on the original image and another on the segmented lesion whose features are fused via multi-head cross-attention. This dual-input design guides the model to focus on salient pathological regions. In addition, a transformer-based module incorporates patient metadata (age, sex, lesion site) into the prediction. We evaluate our approach on the HAM10000 dataset and the ISIC 2018 and 2019 challenges. The proposed method achieves state-of-the-art segmentation performance and significantly improves classification accuracy and average AUC compared to baseline models. To validate our model's reliability, we use Gradient-weighted Class Activation Mapping (Grad-CAM) to generate heatmaps. These visualizations confirm that our model's predictions are based on the lesion area, unlike models that rely on spurious background features. These results demonstrate that integrating precise lesion segmentation and clinical data with attention-based fusion leads to a more accurate and interpretable skin cancer classification model.

Problem

Research questions and friction points this paper is trying to address.

Enhancing skin cancer classification accuracy through lesion segmentation

Improving model interpretability by integrating clinical metadata

Addressing black-box limitations in dermatological deep learning systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-encoder attention framework fuses segmented lesions and metadata

Deep-UNet with attention gates segments lesions for classification

Transformer module integrates clinical metadata into cancer predictions

🔎 Similar Papers

Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM Empowerment