🤖 AI Summary
White matter hyperintensity (WMH) segmentation suffers from “silent failures”—particularly missed small deep WMHs—due to morphological variability, ill-defined boundaries, and intensity similarity with acute infarcts or artifacts. Method: We propose a novel Fazekas scoring framework integrating uncertainty quantification (UQ) with spatial anatomical priors (deep vs. periventricular). For the first time, UQ maps are jointly modeled with anatomical location to actively detect silent failures; we empirically demonstrate that UQ effectively discriminates WMHs from acute infarcts. Segmentation robustness is enhanced via stochastic network architectures and deep ensembles. Evaluation employs Dice score and absolute volume difference (AVD). Results: The framework achieves balanced Fazekas classification accuracy of 0.71 for deep WMHs and 0.82 for periventricular WMHs. Model calibration and abnormal segmentation detection capability are significantly improved, enabling reliable clinical interpretation and error-aware WMH assessment.
📝 Abstract
White Matter Hyperintensities (WMH) are key neuroradiological markers of small vessel disease present in brain MRI. Assessment of WMH is important in research and clinics. However, WMH are challenging to segment due to their high variability in shape, location, size, poorly defined borders, and similar intensity profile to other pathologies (e.g stroke lesions) and artefacts (e.g head motion). In this work, we apply the most effective techniques for uncertainty quantification (UQ) in segmentation to the WMH segmentation task across multiple test-time data distributions. We find a combination of Stochastic Segmentation Networks with Deep Ensembles yields the highest Dice and lowest Absolute Volume Difference % (AVD) score on in-domain and out-of-distribution data. We demonstrate the downstream utility of UQ, proposing a novel method for classification of the clinical Fazekas score using spatial features extracted for WMH segmentation and UQ maps. We show that incorporating WMH uncertainty information improves Fazekas classification performance and calibration, with median class balanced accuracy for classification models with (UQ and spatial WMH features)/(spatial WMH features)/(WMH volume only) of 0.71/0.66/0.60 in the Deep WMH and 0.82/0.77/0.73 in the Periventricular WMH regions respectively. We demonstrate that stochastic UQ techniques with high sample diversity can improve the detection of poor quality segmentations. Finally, we qualitatively analyse the semantic information captured by UQ techniques and demonstrate that uncertainty can highlight areas where there is ambiguity between WMH and stroke lesions, while identifying clusters of small WMH in deep white matter unsegmented by the model.