CleanPatrick: A Benchmark for Image Data Cleaning

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image data cleaning benchmarks predominantly rely on synthetic noise or small-scale manual annotations, suffering from limited realism and cross-study comparability. This work introduces the first large-scale, real-world medical image cleaning benchmark, built upon the Fitzpatrick17k dataset and annotated by 933 medically trained crowdworkers who provided 496,000 binary labels. The benchmark systematically identifies off-topic samples, near-duplicate images, and fine-grained label errors, establishing high-confidence ground-truth cleaning annotations. Methodologically, we reformulate cleaning as a ranking problem and propose a novel item-response-theory-inspired annotation aggregation model, integrated with expert verification and a standardized evaluation protocol. Experiments reveal that self-supervised representations achieve state-of-the-art performance in near-duplicate detection; classical methods remain highly cost-effective for off-topic sample identification under constrained annotation budgets; yet fine-grained correction of medical labels remains a critical unsolved challenge.

Technology Category

Application Category

📝 Abstract
Robust machine learning depends on clean data, yet current image data cleaning benchmarks rely on synthetic noise or narrow human studies, limiting comparison and real-world relevance. We introduce CleanPatrick, the first large-scale benchmark for data cleaning in the image domain, built upon the publicly available Fitzpatrick17k dermatology dataset. We collect 496,377 binary annotations from 933 medical crowd workers, identify off-topic samples (4%), near-duplicates (21%), and label errors (22%), and employ an aggregation model inspired by item-response theory followed by expert review to derive high-quality ground truth. CleanPatrick formalizes issue detection as a ranking task and adopts typical ranking metrics mirroring real audit workflows. Benchmarking classical anomaly detectors, perceptual hashing, SSIM, Confident Learning, NoiseRank, and SelfClean, we find that, on CleanPatrick, self-supervised representations excel at near-duplicate detection, classical methods achieve competitive off-topic detection under constrained review budgets, and label-error detection remains an open challenge for fine-grained medical classification. By releasing both the dataset and the evaluation framework, CleanPatrick enables a systematic comparison of image-cleaning strategies and paves the way for more reliable data-centric artificial intelligence.
Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale benchmarks for real-world image data cleaning
Need for accurate detection of off-topic, near-duplicate, and label-error samples
Challenges in evaluating cleaning methods for fine-grained medical classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale benchmark using Fitzpatrick17k dataset
Item-response theory for annotation aggregation
Formalizes issue detection as ranking task
🔎 Similar Papers
No similar papers found.
F
Fabian Groger
University of Basel, Lucerne University of Applied Sciences and Arts
Simone Lionetti
Simone Lionetti
Senior Research Associate, HSLU
Machine LearningTheoretical Particle Physics
P
P. Gottfrois
University of Basel
Á
Á. González-Jiménez
Lucerne University of Applied Sciences and Arts
L
L. Amruthalingam
Lucerne University of Applied Sciences and Arts
E
E. Goessinger
University Hospital of Basel
H
Hanna Lindemann
University Hospital of Basel
M
Marie Bargiela
University Hospital of Basel
M
Marie Hofbauer
University Hospital of Basel
O
Omar Badri
Northeast Dermatology Associates
Philipp Tschandl
Philipp Tschandl
Medical University of Vienna
A
A. Koochek
Banner Health
Matthew Groh
Matthew Groh
Northwestern University
Human-AI CollaborationComputational Social ScienceCognitive ScienceAffective Computing
Alexander A. Navarini
Alexander A. Navarini
University of Basel, Department of Dermatology
GeneticsDermatologyPsoriasisNeutrophilsHair
M
M. Pouly
Lucerne University of Applied Sciences and Arts