🤖 AI Summary
Accurate MRI segmentation of vestibular schwannomas (VS) is critical for clinical management, yet existing deep learning methods suffer from poor generalizability and reliance on labor-intensive manual annotation. To address this, we introduce the first multi-center, longitudinal VS dataset comprising 534 T1-weighted contrast-enhanced scans, and propose a human-in-the-loop closed-loop training paradigm that integrates expert consensus labeling, iterative deep learning–based segmentation, and human quality feedback to enable continuous model refinement. Our approach improves annotation efficiency by 37.4% over manual labeling and achieves an internal validation Dice score of 0.9670 (+5.45%), with robust performance on external datasets. Key contributions include: (1) open-sourcing a high-quality longitudinal VS dataset; (2) establishing a scalable, human-in-the-loop segmentation framework; and (3) empirically validating that consensus-driven labeling enhances both model generalizability and clinical trustworthiness.
📝 Abstract
Accurate segmentation of vestibular schwannoma (VS) on Magnetic Resonance Imaging (MRI) is essential for patient management but often requires time-intensive manual annotations by experts. While recent advances in deep learning (DL) have facilitated automated segmentation, challenges remain in achieving robust performance across diverse datasets and complex clinical cases. We present an annotated dataset stemming from a bootstrapped DL-based framework for iterative segmentation and quality refinement of VS in MRI. We combine data from multiple centres and rely on expert consensus for trustworthiness of the annotations. We show that our approach enables effective and resource-efficient generalisation of automated segmentation models to a target data distribution. The framework achieved a significant improvement in segmentation accuracy with a Dice Similarity Coefficient (DSC) increase from 0.9125 to 0.9670 on our target internal validation dataset, while maintaining stable performance on representative external datasets. Expert evaluation on 143 scans further highlighted areas for model refinement, revealing nuanced cases where segmentation required expert intervention. The proposed approach is estimated to enhance efficiency by approximately 37.4% compared to the conventional manual annotation process. Overall, our human-in-the-loop model training approach achieved high segmentation accuracy, highlighting its potential as a clinically adaptable and generalisable strategy for automated VS segmentation in diverse clinical settings. The dataset includes 190 patients, with tumour annotations available for 534 longitudinal contrast-enhanced T1-weighted (T1CE) scans from 184 patients, and non-annotated T2-weighted scans from 6 patients. This dataset is publicly accessible on The Cancer Imaging Archive (TCIA) (https://doi.org/10.7937/bq0z-xa62).