🤖 AI Summary
This work addresses the high annotation cost of fully supervised 3D segmentation in high-resolution ex vivo MRI by proposing a weakly supervised learning framework based on sparse 2D slice annotations. The approach employs a 2D teacher–3D student architecture: a 2D teacher model trained on sparse annotations generates dense pseudo-labels to train a 3D student model. Systematic evaluation reveals that human-centric visual enhancements, such as CLAHE, are detrimental to machine learning models and that dimensionality-aware regularization strategies are essential for both 2D and 3D models. Experiments show that the 2D teacher achieves over an 11-point Dice improvement in white matter lesion segmentation; however, directly transferring the same strategy to the 3D student degrades performance. Moreover, gray matter lesion Dice scores drop by approximately 25 points due to human-oriented preprocessing, highlighting critical challenges and optimization principles in cross-dimensional weakly supervised learning.
📝 Abstract
INTRODUCTION | Fully supervised 3D segmentation of high-resolution ex vivo MRI is limited by the prohibitive cost of volumetric annotation, forcing reliance on sparse 2D slices. Weakly supervised Sparse-to-Dense frameworks bridge this gap, but guidelines remain ambiguous regarding human-centric visual enhancements and transferring optimization strategies across dimensions. We analyze divergent regularization needs for multi-class segmentation of high-resolution ex vivo spinal cord MRI.
METHODS | We used 9.4T MRI of multiple sclerosis spinal cords (>104,000 slices) with sparse annotations (428 slices). A 2D Teacher trained on sparse slices generated dense pseudo-labels to train a 3D Student. We systematically evaluated the impact of human-centric preprocessing, spatial augmentation, and soft-label regularization on both architectures.
RESULTS | We identified a critical divergence in training dynamics. The 2D Teacher required strong spatial augmentation and soft-labeling to overcome data scarcity, improving White Matter Lesion Dice scores by >11 points. However, propagating these techniques to the 3D Student degraded its performance. Furthermore, human-centric preprocessing (e.g., CLAHE) disrupted global statistical cues, dropping Gray Matter Lesion Dice scores by ~25 points.
DISCUSSION | Our study highlights a perception divergence (human-centric contrast enhancement harms machine models) and a regularization conflict across dimensions. 3D architectures trained on dense pseudo-labels exhibit fundamentally different optimization landscapes than 2D counterparts and require distinct, conservative regularization. Code and models: https://github.com/ivadomed/model_seg_sc-gm-lesion_human_ms_exvivo_t2star.