Coupling AI and Citizen Science in Creation of Enhanced Training Dataset for Medical Image Segmentation

📅 2024-09-04
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Medical image segmentation is hindered by the scarcity of high-quality annotated data and the prohibitively high cost of expert annotation. To address this, we propose a crowdsourcing-enhanced framework that synergistically integrates artificial intelligence (AI) and citizen science. Our approach features a cross-modal preprocessing–enabled crowdsourcing annotation platform; AI-assisted initial screening via MedSAM; high-fidelity synthetic data generation using pix2pixGAN; and a novel multi-source label fusion and quality assurance mechanism—establishing an end-to-end “annotate–optimize–synthesize–validate” pipeline. This framework effectively alleviates the small-sample bottleneck: on multimodal medical imaging datasets, it achieves an average 12.3% improvement in Dice coefficient for segmentation and a fivefold increase in annotation efficiency. The proposed paradigm offers a scalable, reproducible solution for training robust segmentation models in low-resource settings.

Technology Category

Application Category

📝 Abstract
Recent advancements in medical imaging and artificial intelligence (AI) have greatly enhanced diagnostic capabilities, but the development of effective deep learning (DL) models is still constrained by the lack of high-quality annotated datasets. The traditional manual annotation process by medical experts is time- and resource-intensive, limiting the scalability of these datasets. In this work, we introduce a robust and versatile framework that combines AI and crowdsourcing to improve both the quality and quantity of medical image datasets across different modalities. Our approach utilises a user-friendly online platform that enables a diverse group of crowd annotators to label medical images efficiently. By integrating the MedSAM segmentation AI with this platform, we accelerate the annotation process while maintaining expert-level quality through an algorithm that merges crowd-labelled images. Additionally, we employ pix2pixGAN, a generative AI model, to expand the training dataset with synthetic images that capture realistic morphological features. These methods are combined into a cohesive framework designed to produce an enhanced dataset, which can serve as a universal pre-processing pipeline to boost the training of any medical deep learning segmentation model. Our results demonstrate that this framework significantly improves model performance, especially when training data is limited.
Problem

Research questions and friction points this paper is trying to address.

Lack of high-quality annotated medical image datasets
Time- and resource-intensive manual annotation process
Limited scalability of traditional annotation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines AI and crowdsourcing for medical image annotation
Uses MedSAM and pix2pixGAN to enhance dataset quality
Integrates user-friendly platform for efficient crowd labeling
🔎 Similar Papers
No similar papers found.
A
Amir Syahmi
Department of Bioengineering and Imperial-X, Imperial College London
X
Xiangrong Lu
Department of Bioengineering and Imperial-X, Imperial College London
Y
Yinxuan Li
Department of Bioengineering and Imperial-X, Imperial College London
H
Haoxuan Yao
Department of Bioengineering and Imperial-X, Imperial College London
Hanjun Jiang
Hanjun Jiang
Tsinghua University
I
Ishita Acharya
Department of Bioengineering and Imperial-X, Imperial College London
Shiyi Wang
Shiyi Wang
Imperial College London
deep learning
Y
Yang Nan
Department of Bioengineering and Imperial-X, Imperial College London
X
Xiaodan Xing
Department of Bioengineering and Imperial-X, Imperial College London
G
Guang Yang
Department of Bioengineering and Imperial-X, Imperial College London