ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models

📅 2025-01-06
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Medical AI development is hindered by lengthy dataset curation cycles and the decoupling of annotation from model training. To address this, we propose an AI-driven collaborative co-evolution framework for medical data, instantiated on pancreatic tumor CT analysis. Our approach introduces a novel human-in-the-loop, progressive “data flywheel” mechanism that jointly enhances annotation quality and model performance. Methodologically, it integrates multi-round human-in-the-loop iteration, 3D voxel-level semi-automatic annotation, domain-adaptive few-shot learning, and cross-task joint modeling (detection, segmentation, classification). We construct a high-quality, multi-task dataset comprising 25,362 CT scans. Our flagship model achieves annotation accuracy comparable to that of experts with 30 years of experience, delivering performance gains of 14%, 5%, and 72% over prior state-of-the-art on detection, segmentation, and classification benchmarks, respectively. This work transcends static dataset paradigms, enabling dynamic, scalable, and trustworthy medical AI infrastructure.

Technology Category

Application Category

📝 Abstract
Building trusted datasets is critical for transparent and responsible Medical AI (MAI) research, but creating even small, high-quality datasets can take years of effort from multidisciplinary teams. This process often delays AI benefits, as human-centric data creation and AI-centric model development are treated as separate, sequential steps. To overcome this, we propose ScaleMAI, an agent of AI-integrated data curation and annotation, allowing data quality and AI performance to improve in a self-reinforcing cycle and reducing development time from years to months. We adopt pancreatic tumor detection as an example. First, ScaleMAI progressively creates a dataset of 25,362 CT scans, including per-voxel annotations for benign/malignant tumors and 24 anatomical structures. Second, through progressive human-in-the-loop iterations, ScaleMAI provides Flagship AI Model that can approach the proficiency of expert annotators (30-year experience) in detecting pancreatic tumors. Flagship Model significantly outperforms models developed from smaller, fixed-quality datasets, with substantial gains in tumor detection (+14%), segmentation (+5%), and classification (72%) on three prestigious benchmarks. In summary, ScaleMAI transforms the speed, scale, and reliability of medical dataset creation, paving the way for a variety of impactful, data-driven applications.
Problem

Research questions and friction points this paper is trying to address.

Medical AI
Dataset Creation
Time-consuming Process
Innovation

Methods, ideas, or system contributions that make the work stand out.

ScaleMAI
Medical Data Curation
AI Model Enhancement
🔎 Similar Papers
No similar papers found.
Wenxuan Li
Wenxuan Li
Johns Hopkins University
Imaging InformaticsComputer-aided Diagnosis
P
P. R. Bassi
University of Bologna, Italian Institute of Technology
Tianyu Lin
Tianyu Lin
Johns Hopkins University
Medical Image AnalysisComputer Vision
Yu-Cheng Chou
Yu-Cheng Chou
Johns Hopkins University
MLLMReinforcement LearningComputer Vision
X
Xinze Zhou
Johns Hopkins University
Yucheng Tang
Yucheng Tang
Sr. Research Scientist at NVIDIA
3D Computer VisionVision-Language ModelHealthcare AIAccelerated Computing
Fabian Isensee
Fabian Isensee
HIP Applied Computer Vision Lab, Division of Medical Image Computing, German Cancer Research Center
Computer VisionDeep LearningSegmentationMedical Image Computing
K
Kang Wang
University of California, San Francisco
Q
Qi Chen
Johns Hopkins University, University of Chinese Academy of Sciences
X
Xiaowei Xu
Guangdong Provincial People’s Hospital
Xiaoxi Chen
Xiaoxi Chen
University of Illinois Urbana-Champaign
Diagnostic RadiologyTranslational MedicineQuantitative Medical ImagingAI in Medical Imaging
Lizhou Wu
Lizhou Wu
National University of Defense Technology, China
Spintronic Design and TestMemory SystemsEmerging Computing Paradigms
Q
Qilong Wu
National University of Singapore
Yannick Kirchhoff
Yannick Kirchhoff
PhD Student, DKFZ
Computer VisionDeep LearningMedical Image Computing
M
Maximilian R. Rokuss
DKFZ
Saikat Roy
Saikat Roy
Doctoral Researcher, German Cancer Research Center (DKFZ)
Deep LearningImage SegmentationRepresentation LearningDiffusion ModelsMedical Image Analysis
Y
Yuxuan Zhao
Qilu Hospital of Shandong University
D
Dexin Yu
Qilu Hospital of Shandong University
K
Kai Ding
Johns Hopkins Medicine
Constantin Ulrich
Constantin Ulrich
German Cancer Research Center (DKFZ)
Medical Image ComputingMedical physicsComputer Vision
K
Klaus Maier-Hein
DKFZ
Y
Yang Yang
University of California, San Francisco
A
Alan L. Yuille
Johns Hopkins University
Z
Zongwei Zhou
Johns Hopkins University