🤖 AI Summary
MRI 3D brain tumor segmentation faces dual challenges: high computational complexity and insufficient modeling of spatial continuity across adjacent slices. To address these, we propose a temporal-aware lightweight segmentation framework that treats MRI sequences analogously to video frames, leveraging sequence models to capture inter-slice spatial dependencies. We introduce Mesh-Cast—a novel mechanism unifying channel- and time-wise feature aggregation—and design a two-phase training strategy (TPS) that first learns sequence-level commonalities and then refines slice-specific representations. By avoiding costly full 3D convolutions, our method achieves state-of-the-art performance on BraTS2019 and BraTS2023, significantly improving both segmentation accuracy and cross-slice consistency. Experimental results validate the effectiveness and generalizability of temporal modeling for MRI segmentation.
📝 Abstract
MRI tumor segmentation remains a critical challenge in medical imaging, where volumetric analysis faces unique computational demands due to the complexity of 3D data. The spatially sequential arrangement of adjacent MRI slices provides valuable information that enhances segmentation continuity and accuracy, yet this characteristic remains underutilized in many existing models. The spatial correlations between adjacent MRI slices can be regarded as "temporal-like" data, similar to frame sequences in video segmentation tasks. To bridge this gap, we propose M-Net, a flexible framework specifically designed for sequential image segmentation. M-Net introduces the novel Mesh-Cast mechanism, which seamlessly integrates arbitrary sequential models into the processing of both channel and temporal information, thereby systematically capturing the inherent "temporal-like" spatial correlations between MRI slices. Additionally, we define an MRI sequential input pattern and design a Two-Phase Sequential (TPS) training strategy, which first focuses on learning common patterns across sequences before refining slice-specific feature extraction. This approach leverages temporal modeling techniques to preserve volumetric contextual information while avoiding the high computational cost of full 3D convolutions, thereby enhancing the generalizability and robustness of M-Net in sequential segmentation tasks. Experiments on the BraTS2019 and BraTS2023 datasets demonstrate that M-Net outperforms existing methods across all key metrics, establishing itself as a robust solution for temporally-aware MRI tumor segmentation.