A COCO-Formatted Instance-Level Dataset for Plasmodium Falciparum Detection in Giemsa-Stained Blood Smears

📅 2025-07-24
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
To address the severe scarcity of instance-level annotated data for *Plasmodium falciparum* in malaria diagnosis, this work introduces the first COCO-formatted, fine-grained object detection dataset covering infected/non-infected red blood cells and white blood cells. Annotation combines automated pre-screening with expert-led manual correction, significantly improving inter-annotator consistency and semantic accuracy. Leveraging this dataset, we conduct five-fold cross-validation using Faster R-CNN for instance-level parasite detection, achieving an F1 score of 0.88—demonstrating the dataset’s efficacy in training high-performance deep learning models. The dataset is publicly released under an open license, providing a high-quality, standardized benchmark and reproducible technical foundation for AI-driven automated malaria diagnosis in low-resource settings.

Technology Category

Application Category

📝 Abstract
Accurate detection of Plasmodium falciparum in Giemsa-stained blood smears is an essential component of reliable malaria diagnosis, especially in developing countries. Deep learning-based object detection methods have demonstrated strong potential for automated Malaria diagnosis, but their adoption is limited by the scarcity of datasets with detailed instance-level annotations. In this work, we present an enhanced version of the publicly available NIH malaria dataset, with detailed bounding box annotations in COCO format to support object detection training. We validated the revised annotations by training a Faster R-CNN model to detect infected and non-infected red blood cells, as well as white blood cells. Cross-validation on the original dataset yielded F1 scores of up to 0.88 for infected cell detection. These results underscore the importance of annotation volume and consistency, and demonstrate that automated annotation refinement combined with targeted manual correction can produce training data of sufficient quality for robust detection performance. The updated annotations set is publicly available via GitHub: https://github.com/MIRA-Vision-Microscopy/malaria-thin-smear-coco.
Problem

Research questions and friction points this paper is trying to address.

Detecting Plasmodium falciparum in blood smears accurately
Addressing scarcity of instance-level annotated malaria datasets
Improving annotation quality for robust detection performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced NIH malaria dataset with COCO annotations
Used Faster R-CNN for infected cell detection
Automated and manual annotation refinement for quality
🔎 Similar Papers
No similar papers found.
Frauke Wilm
Frauke Wilm
Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-UniversitÀt Erlangen
Digital PathologyMachine LearningPattern Recognition
L
Luis Carlos Rivera Monroy
MIRA Vision Microscopy GmbH, 73037 Göppingen, Germany; Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-UniversitĂ€t (FAU) Erlangen-NĂŒrnberg, Erlangen, Germany
M
Mathias Öttl
MIRA Vision Microscopy GmbH, 73037 Göppingen, Germany; Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-UniversitĂ€t (FAU) Erlangen-NĂŒrnberg, Erlangen, Germany
L
Lukas MĂŒrdter
MIRA Vision Microscopy GmbH, 73037 Göppingen, Germany
Leonid Mill
Leonid Mill
MIRA Vision
A
Andreas Maier
Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-UniversitĂ€t (FAU) Erlangen-NĂŒrnberg, Erlangen, Germany