CytoCrowd: A Multi-Annotator Benchmark Dataset for Cytology Image Analysis

📅 2026-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scarcity of medical imaging datasets that capture inter-annotator disagreement and provide an independent gold standard, which hinders objective evaluation of model robustness. To this end, the authors introduce CytoCrowd, a benchmark dataset comprising 446 high-resolution cytology images, each annotated independently by four pathologists and accompanied by an independent gold standard established by a senior expert. CytoCrowd is the first cytology image dataset to simultaneously offer multiple raw expert annotations and a separate reference standard, enabling joint evaluation of standard vision tasks—such as object detection and classification—and annotation aggregation algorithms. By releasing the dataset along with baseline results, this study establishes a realistic and quantifiable benchmark for investigating annotation inconsistency and evaluating fusion strategies, thereby advancing the development of robust medical image analysis models.

Technology Category

Application Category

📝 Abstract
High-quality annotated datasets are crucial for advancing machine learning in medical image analysis. However, a critical gap exists: most datasets either offer a single, clean ground truth, which hides real-world expert disagreement, or they provide multiple annotations without a separate gold standard for objective evaluation. To bridge this gap, we introduce CytoCrowd, a new public benchmark for cytology analysis. The dataset features 446 high-resolution images, each with two key components: (1) raw, conflicting annotations from four independent pathologists, and (2) a separate, high-quality gold-standard ground truth established by a senior expert. This dual structure makes CytoCrowd a versatile resource. It serves as a benchmark for standard computer vision tasks, such as object detection and classification, using the ground truth. Simultaneously, it provides a realistic testbed for evaluating annotation aggregation algorithms that must resolve expert disagreements. We provide comprehensive baseline results for both tasks. Our experiments demonstrate the challenges presented by CytoCrowd and establish its value as a resource for developing the next generation of models for medical image analysis.
Problem

Research questions and friction points this paper is trying to address.

medical image analysis
multi-annotator disagreement
gold-standard annotation
cytology dataset
annotation aggregation
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-annotator
gold standard
cytology image analysis
annotation disagreement
benchmark dataset
🔎 Similar Papers
No similar papers found.
Y
Yonghao Si
Sun Yat-sen University, Hong Kong University of Science and Technology (Guang Zhou), Guangzhou, China
X
Xingyuan Zeng
Sun Yat-sen University, Guangzhou, China
Z
Zhao Chen
Hong Kong University of Science and Technology (Guang Zhou), Guangzhou, China
Libin Zheng
Libin Zheng
School of Artificial Intelligence, Sun Yat-sen University
Caleb Chen Cao
Caleb Chen Cao
Senior Manager(R&D), Big Data Institute, HKUST
Data-centric AIHuman Computation
Lei Chen
Lei Chen
Hong Kong University of Science and Technology
Human Powered Machine LearningDatabasesData Mining
J
Jian Yin
Sun Yat-sen University, Guangzhou, China