🤖 AI Summary
Clinical 3D CT quantification of mediastinal lymph nodes is hindered by severe scarcity of fully supervised annotations and incomplete labeling. Method: We introduce the first partially annotated mediastinal lymph node 3D CT dataset and a standardized evaluation framework, systematically benchmarking 16 weakly supervised segmentation methods under realistic clinical conditions. Contribution/Results: Our study identifies, for the first time, an effective fusion strategy combining weak supervision with minimal full supervision. Pure weakly supervised methods achieve a median Dice score of 61.0%; incorporating only <5% fully labeled data elevates Dice to over 70%. This work advances clinically deployable weakly supervised medical image segmentation and establishes a reproducible paradigm for cancer staging and treatment response assessment under low-labeling-budget and few-shot constraints.
📝 Abstract
Accurate assessment of lymph node size in 3D CT scans is crucial for cancer staging, therapeutic management, and monitoring treatment response. Existing state-of-the-art segmentation frameworks in medical imaging often rely on fully annotated datasets. However, for lymph node segmentation, these datasets are typically small due to the extensive time and expertise required to annotate the numerous lymph nodes in 3D CT scans. Weakly-supervised learning, which leverages incomplete or noisy annotations, has recently gained interest in the medical imaging community as a potential solution. Despite the variety of weakly-supervised techniques proposed, most have been validated only on private datasets or small publicly available datasets. To address this limitation, the Mediastinal Lymph Node Quantification (LNQ) challenge was organized in conjunction with the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). This challenge aimed to advance weakly-supervised segmentation methods by providing a new, partially annotated dataset and a robust evaluation framework. A total of 16 teams from 5 countries submitted predictions to the validation leaderboard, and 6 teams from 3 countries participated in the evaluation phase. The results highlighted both the potential and the current limitations of weakly-supervised approaches. On one hand, weakly-supervised approaches obtained relatively good performance with a median Dice score of $61.0%$. On the other hand, top-ranked teams, with a median Dice score exceeding $70%$, boosted their performance by leveraging smaller but fully annotated datasets to combine weak supervision and full supervision. This highlights both the promise of weakly-supervised methods and the ongoing need for high-quality, fully annotated data to achieve higher segmentation performance.