🤖 AI Summary
This study addresses the challenges of limited data availability and severe class imbalance in melanoma imaging, which hinder deep learning performance. It presents the first systematic evaluation of DCGAN, StyleGAN2, and two StyleGAN3 variants for generating high-resolution dermoscopic images. Through standardized preprocessing, hyperparameter optimization, and comprehensive assessment—including FID, Fréchet Medical Distance (FMD), frozen EfficientNet-based classification, and double-blind dermatologist evaluations—StyleGAN2 emerges as the superior model. On the ISIC 2020 dataset, it achieves an FID of 7.96, with 83% of generated images correctly classified as melanoma, while dermatologists distinguish real from synthetic images at only 66.5% accuracy. When used for data augmentation, StyleGAN2-synthesized images improve lesion detection AUC from 0.925 to 0.945, demonstrating their efficacy in preserving diagnostically relevant features and mitigating class imbalance.
📝 Abstract
Melanoma is the most lethal form of skin cancer, and early detection is critical for improving patient outcomes. Although dermoscopy combined with deep learning has advanced automated skin-lesion analysis, progress is hindered by limited access to large, well-annotated datasets and by severe class imbalance, where melanoma images are substantially underrepresented. To address these challenges, we present the first systematic benchmarking study comparing four GAN architectures-DCGAN, StyleGAN2, and two StyleGAN3 variants (T/R)-for high-resolution melanoma-specific synthesis. We train and optimize all models on two expert-annotated benchmarks (ISIC 2018 and ISIC 2020) under unified preprocessing and hyperparameter exploration, with particular attention to R1 regularization tuning. Image quality is assessed through a multi-faceted protocol combining distribution-level metrics (FID), sample-level representativeness (FMD), qualitative dermoscopic inspection, downstream classification with a frozen EfficientNet-based melanoma detector, and independent evaluation by two board-certified dermatologists. StyleGAN2 achieves the best balance of quantitative performance and perceptual quality, attaining FID scores of 24.8 (ISIC 2018) and 7.96 (ISIC 2020) at gamma=0.8. The frozen classifier recognizes 83% of StyleGAN2-generated images as melanoma, while dermatologists distinguish synthetic from real images at only 66.5% accuracy (chance = 50%), with low inter-rater agreement (kappa = 0.17). In a controlled augmentation experiment, adding synthetic melanoma images to address class imbalance improved melanoma detection AUC from 0.925 to 0.945 on a held-out real-image test set. These findings demonstrate that StyleGAN2-generated melanoma images preserve diagnostically relevant features and can provide a measurable benefit for mitigating class imbalance in melanoma-focused machine learning pipelines.