๐ค AI Summary
In universal lesion detection (ULDT) from CT scans, the DeepLesion dataset suffers from incomplete annotations and severe class imbalance, hindering robust multi-class lesion detection. Method: We propose a multi-round self-training framework built upon the VFNet detector, incorporating dynamic confidence-thresholding for pseudo-label selection, undersampling-guided oversampling of underrepresented lesion classes, and iterative refinement of pseudo-labelsโentirely without additional manual annotation. Contribution/Results: To our knowledge, this is the first method achieving simultaneous sensitivity improvement across all eight lesion classes under a strict 4 false positives per scan (4FP) constraint. The overall sensitivity reaches 78.5%, representing an absolute gain of 11.7% over the baseline. Critically, detection performance does not degrade for any anatomical region; gains are especially pronounced for minority classes. The approach significantly enhances model generalizability and clinical applicability.
๐ Abstract
Universal lesion detection and tagging (ULDT) in CT studies is critical for tumor burden assessment and tracking the progression of lesion status (growth/shrinkage) over time. However, a lack of fully annotated data hinders the development of effective ULDT approaches. Prior work used the DeepLesion dataset (4,427 patients, 10,594 studies, 32,120 CT slices, 32,735 lesions, 8 body part labels) for algorithmic development, but this dataset is not completely annotated and contains class imbalances. To address these issues, in this work, we developed a self-training pipeline for ULDT. A VFNet model was trained on a limited 11.5% subset of DeepLesion (bounding boxes + tags) to detect and classify lesions in CT studies. Then, it identified and incorporated novel lesion candidates from a larger unseen data subset into its training set, and self-trained itself over multiple rounds. Multiple self-training experiments were conducted with different threshold policies to select predicted lesions with higher quality and cover the class imbalances. We discovered that direct self-training improved the sensitivities of over-represented lesion classes at the expense of under-represented classes. However, upsampling the lesions mined during self-training along with a variable threshold policy yielded a 6.5% increase in sensitivity at 4 FP in contrast to self-training without class balancing (72% vs 78.5%) and a 11.7% increase compared to the same self-training policy without upsampling (66.8% vs 78.5%). Furthermore, we show that our results either improved or maintained the sensitivity at 4FP for all 8 lesion classes.