Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets

📅 2024-01-25
🏛️ Scientific Data
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically audits quality deficiencies in two leading dermatological image datasets—DermaMNIST and Fitzpatrick17k—focusing on label noise, image blurriness, lesion invisibility, and inconsistent Fitzpatrick skin-type annotations. We introduce the first reproducible, multi-dimensional evaluation framework integrating image sharpness analysis, label confidence calibration, lesion segmentation validation, and cross-dataset statistical analysis of skin-tone distributions. Results reveal that 23% of samples exhibit severe label errors or non-visible lesions, while Fitzpatrick skin-type misclassification reaches 18.7%. Beyond quantifying structural limitations undermining clinical generalizability, this work establishes a standardized quality assessment paradigm and provides empirically grounded correction strategies. By identifying and characterizing dataset-level biases and inaccuracies, our approach delivers a foundational data benchmark for developing robust, clinically deployable dermatological AI systems.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Data Quality
Deep Learning
Skin Image Databases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Learning
Skin Image Databases
Data Quality Improvement
🔎 Similar Papers
No similar papers found.
K
Kumar Abhishek
School of Computing Science, Simon Fraser University, Canada
A
Aditi Jain
Department of Mathematics, Indian Institute Of Technology Delhi, India
Ghassan Hamarneh
Ghassan Hamarneh
Computing Science, Simon Fraser University
Medical Image AnalysisMedical Image ComputingMachine LearningDeep LearningComputer Vision