🤖 AI Summary
To address the scarcity of lymph node (LN) segmentation annotations in head-and-neck CT imaging, this work introduces the first large-scale LN annotation dataset comprising 36,106 annotated LNs. We propose Dynamic Gradient Sparsification Training (DGST), a few-shot fine-tuning method that adaptively identifies and updates the most sensitive parameter subset via hierarchical freezing and gradient importance estimation—thereby preserving foundational model knowledge while enhancing task specificity. Implemented within the nnUNetv2 framework, DGST achieves state-of-the-art performance on SegRap2023 and LNQ2023 benchmarks: with only 1–5 annotated cases per domain, it surpasses leading few-shot segmentation methods by over 8.2 percentage points in Dice score, demonstrating strong generalizability. To foster reproducibility and advance foundation models for medical image segmentation, we publicly release the dataset, trained models, and source code.
📝 Abstract
Accurate lymph node (LN) segmentation is critical in radiotherapy treatment and prognosis analysis, but is limited by the need for large annotated datasets. While deep learning-based segmentation foundation models show potential in developing high-performing models with fewer samples, their medical adaptation faces LN domain-specific prior deficiencies and inefficient few-shot fine-tuning for complex clinical practices, highlighting the necessity of an LN segmentation foundation model. In this work, we annotated 36,106 visible LNs from 3,346 publicly available head-and-neck CT scans to establish a robust LN segmentation model (nnUNetv2). Building on this, we propose Dynamic Gradient Sparsification Training (DGST), a few-shot fine-tuning approach that preserves foundational knowledge while dynamically updating the most critical parameters of the LN segmentation model with few annotations. We validate it on two publicly available LN segmentation datasets: SegRap2023 and LNQ2023. The results show that DGST outperforms existing few-shot fine-tuning methods, achieving satisfactory performance with limited labeled data. We release the dataset, models and all implementations to facilitate relevant research: https://github.com/Zihaoluoh/LN-Seg-FM.