Correcting class imbalances with self-training for improved universal lesion detection and tagging

๐Ÿ“… 2023-04-07
๐Ÿ›๏ธ Medical Imaging
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In universal lesion detection (ULDT) from CT scans, the DeepLesion dataset suffers from incomplete annotations and severe class imbalance, hindering robust multi-class lesion detection. Method: We propose a multi-round self-training framework built upon the VFNet detector, incorporating dynamic confidence-thresholding for pseudo-label selection, undersampling-guided oversampling of underrepresented lesion classes, and iterative refinement of pseudo-labelsโ€”entirely without additional manual annotation. Contribution/Results: To our knowledge, this is the first method achieving simultaneous sensitivity improvement across all eight lesion classes under a strict 4 false positives per scan (4FP) constraint. The overall sensitivity reaches 78.5%, representing an absolute gain of 11.7% over the baseline. Critically, detection performance does not degrade for any anatomical region; gains are especially pronounced for minority classes. The approach significantly enhances model generalizability and clinical applicability.

Technology Category

Application Category

๐Ÿ“ Abstract
Universal lesion detection and tagging (ULDT) in CT studies is critical for tumor burden assessment and tracking the progression of lesion status (growth/shrinkage) over time. However, a lack of fully annotated data hinders the development of effective ULDT approaches. Prior work used the DeepLesion dataset (4,427 patients, 10,594 studies, 32,120 CT slices, 32,735 lesions, 8 body part labels) for algorithmic development, but this dataset is not completely annotated and contains class imbalances. To address these issues, in this work, we developed a self-training pipeline for ULDT. A VFNet model was trained on a limited 11.5% subset of DeepLesion (bounding boxes + tags) to detect and classify lesions in CT studies. Then, it identified and incorporated novel lesion candidates from a larger unseen data subset into its training set, and self-trained itself over multiple rounds. Multiple self-training experiments were conducted with different threshold policies to select predicted lesions with higher quality and cover the class imbalances. We discovered that direct self-training improved the sensitivities of over-represented lesion classes at the expense of under-represented classes. However, upsampling the lesions mined during self-training along with a variable threshold policy yielded a 6.5% increase in sensitivity at 4 FP in contrast to self-training without class balancing (72% vs 78.5%) and a 11.7% increase compared to the same self-training policy without upsampling (66.8% vs 78.5%). Furthermore, we show that our results either improved or maintained the sensitivity at 4FP for all 8 lesion classes.
Problem

Research questions and friction points this paper is trying to address.

Addressing class imbalances in universal lesion detection and tagging
Improving lesion detection sensitivity using self-training with limited annotated data
Enhancing performance for underrepresented lesion classes via upsampling and threshold policies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-training pipeline for lesion detection and tagging
VFNet model trained on limited annotated subset
Upsampling mined lesions with variable threshold policy
๐Ÿ”Ž Similar Papers
No similar papers found.
A
A. Shieh
Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Bethesda MD, USA
T
T. Mathai
Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Bethesda MD, USA
Jianfei Liu
Jianfei Liu
National Institutes of Health
Medical Image AnalysisComputer Vision
Angshuman Paul
Angshuman Paul
Indian Institute of Technology, Jodhpur, Rajasthan, India
R
R. Summers
Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Bethesda MD, USA