🤖 AI Summary
This study addresses the limitations of existing skin lesion segmentation approaches that rely on coarse-grained, discrete skin tone categories, which obscure the true sources of performance disparities. On the HAM10000 and ISIC2017 datasets, the authors employ models including UNet, DeepLabV3 (with ResNet50 backbone), and DINOv2 to introduce, for the first time, a continuous pigmentation analysis based on pixel-level individual typology angle (ITA). They further quantify lesion-to-surrounding-skin contrast using Wasserstein distance. Their findings reveal that global skin lightness exhibits only a weak correlation with segmentation performance, whereas low lesion-skin contrast significantly increases segmentation error, indicating that boundary ambiguity and poor contrast are primary causes of model failure. This work thus offers a novel perspective and quantitative foundation for improving the fairness and robustness of segmentation algorithms across diverse skin tones.
📝 Abstract
Skin cancer, particularly melanoma, remains a major cause of morbidity and mortality, making early detection critical. AI-driven dermatology systems often rely on skin lesion segmentation as a preprocessing step to delineate the lesion from surrounding skin and support downstream analysis. While fairness concerns regarding skin tone have been widely studied for lesion classification, the influence of skin tone on the segmentation stage remains under-quantified and is frequently assessed using coarse, discrete skin tone categories. In this work, we evaluate three strong segmentation architectures (UNet, DeepLabV3 with a ResNet50 backbone, and DINOv2) on two public dermoscopic datasets (HAM10000 and ISIC2017) and introduce a continuous pigment or contrast analysis that treats pixel-wise ITA values as distributions. Using Wasserstein distances between within-image distributions for skin-only, lesion-only, and whole-image regions, we quantify lesion skin contrast and relate it to segmentation performance across multiple metrics. Within the range represented in these datasets, global skin tone metrics (Fitzpatrick grouping or mean ITA) show weak association with segmentation quality. In contrast, low lesion-skin contrast is consistently associated with larger segmentation errors in models, indicating that boundary ambiguity and low contrast are key drivers of failure. These findings suggest that fairness improvements in dermoscopic segmentation should prioritize robust handling of low-contrast lesions, and the distribution-based pigment measures provide a more informative audit signal than discrete skin-tone categories.