🤖 AI Summary
Medical image segmentation for multiple organs suffers from prohibitively high annotation costs, while existing public datasets are limited in scale and organ coverage. To address this, we propose the first closed-loop active learning framework integrating multi-model debiased labeling, uncertainty-driven active sampling, saliency heatmap–guided interactive correction, and error typology modeling—enabling bidirectional iterative optimization between AI and human annotators. Within just three weeks, our framework enabled construction of the largest abdominal CT multi-organ segmentation dataset to date, comprising 8,448 cases, 3.2 million slices, and annotations for eight critical organs—accelerating annotation by over 2,600× compared to estimated expert-only labeling (31 years). Annotation quality matches or exceeds that of domain experts. The dataset is publicly released, establishing a high-quality benchmark resource for large-scale medical image analysis.
📝 Abstract
Annotating medical images, particularly for organ segmentation, is laborious and time-consuming. For example, annotating an abdominal organ requires an estimated rate of 30-60 minutes per CT volume based on the expertise of an annotator and the size, visibility, and complexity of the organ. Therefore, publicly available datasets for multi-organ segmentation are often limited in data size and organ diversity. This paper proposes an active learning method to expedite the annotation process for organ segmentation and creates the largest multi-organ dataset (by far) with the spleen, liver, kidneys, stomach, gallbladder, pancreas, aorta, and IVC annotated in 8,448 CT volumes, equating to 3.2 million slices. The conventional annotation methods would take an experienced annotator up to 1,600 weeks (or roughly 30.8 years) to complete this task. In contrast, our annotation method has accomplished this task in three weeks (based on an 8-hour workday, five days a week) while maintaining a similar or even better annotation quality. This achievement is attributed to three unique properties of our method: (1) label bias reduction using multiple pre-trained segmentation models, (2) effective error detection in the model predictions, and (3) attention guidance for annotators to make corrections on the most salient errors. Furthermore, we summarize the taxonomy of common errors made by AI algorithms and annotators. This allows for continuous revision of both AI and annotations and significantly reduces the annotation costs required to create large-scale datasets for a wider variety of medical imaging tasks.