Towards Explainable Skin Cancer Classification: A Dual-Network Attention Model with Lesion Segmentation and Clinical Metadata Fusion

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Early skin cancer diagnosis faces challenges including high intra-class variability, subtle inter-class distinctions, and insufficient model interpretability. To address these, we propose a dual-encoder cross-attention fusion framework: (1) a Deep-UNet enhanced with dual attention gates and an Atrous Spatial Pyramid Pooling (ASPP) module for precise lesion segmentation; (2) a DenseNet201-based dual-stream encoder jointly processing dermoscopic images and clinical metadata, integrated via multi-head cross-attention and Transformer-based cross-modal modeling; and (3) Grad-CAM visualization for interpretability validation. The framework effectively suppresses background bias and directs attention to pathologically salient regions. Evaluated on HAM10000 and ISIC benchmarks, our method achieves state-of-the-art performance in both segmentation and classification tasks, with significant improvements in AUC (+3.2%) and accuracy (+2.8%), while ensuring strong discriminative capability and clinical credibility.

Technology Category

Application Category

📝 Abstract
Skin cancer is a life-threatening disease where early detection significantly improves patient outcomes. Automated diagnosis from dermoscopic images is challenging due to high intra-class variability and subtle inter-class differences. Many deep learning models operate as "black boxes," limiting clinical trust. In this work, we propose a dual-encoder attention-based framework that leverages both segmented lesions and clinical metadata to enhance skin lesion classification in terms of both accuracy and interpretability. A novel Deep-UNet architecture with Dual Attention Gates (DAG) and Atrous Spatial Pyramid Pooling (ASPP) is first employed to segment lesions. The classification stage uses two DenseNet201 encoders-one on the original image and another on the segmented lesion whose features are fused via multi-head cross-attention. This dual-input design guides the model to focus on salient pathological regions. In addition, a transformer-based module incorporates patient metadata (age, sex, lesion site) into the prediction. We evaluate our approach on the HAM10000 dataset and the ISIC 2018 and 2019 challenges. The proposed method achieves state-of-the-art segmentation performance and significantly improves classification accuracy and average AUC compared to baseline models. To validate our model's reliability, we use Gradient-weighted Class Activation Mapping (Grad-CAM) to generate heatmaps. These visualizations confirm that our model's predictions are based on the lesion area, unlike models that rely on spurious background features. These results demonstrate that integrating precise lesion segmentation and clinical data with attention-based fusion leads to a more accurate and interpretable skin cancer classification model.
Problem

Research questions and friction points this paper is trying to address.

Enhancing skin cancer classification accuracy through lesion segmentation
Improving model interpretability by integrating clinical metadata
Addressing black-box limitations in dermatological deep learning systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-encoder attention framework fuses segmented lesions and metadata
Deep-UNet with attention gates segments lesions for classification
Transformer module integrates clinical metadata into cancer predictions
M
Md. Enamul Atiq
Department of Electrical and Computer Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
Shaikh Anowarul Fattah
Shaikh Anowarul Fattah
Professor, Dept. of EEE, BUET
Machine LearningSignal processingBiomedical EngineeringRoboticsPower and Energy