🤖 AI Summary
This study systematically audits quality deficiencies in two leading dermatological image datasets—DermaMNIST and Fitzpatrick17k—focusing on label noise, image blurriness, lesion invisibility, and inconsistent Fitzpatrick skin-type annotations. We introduce the first reproducible, multi-dimensional evaluation framework integrating image sharpness analysis, label confidence calibration, lesion segmentation validation, and cross-dataset statistical analysis of skin-tone distributions. Results reveal that 23% of samples exhibit severe label errors or non-visible lesions, while Fitzpatrick skin-type misclassification reaches 18.7%. Beyond quantifying structural limitations undermining clinical generalizability, this work establishes a standardized quality assessment paradigm and provides empirically grounded correction strategies. By identifying and characterizing dataset-level biases and inaccuracies, our approach delivers a foundational data benchmark for developing robust, clinically deployable dermatological AI systems.