🤖 AI Summary
This work addresses the performance degradation in brain tumor segmentation caused by missing modalities in clinical multimodal MRI. To tackle this challenge, the authors propose UniME, a two-stage heterogeneous architecture that first employs a unified Vision Transformer (ViT) encoder—pretrained via masked image modeling—to learn global representations robust to missing modalities, and then integrates modality-specific CNN encoders to capture fine-grained features for multiscale fusion-based segmentation. By decoupling representation learning from the segmentation process, UniME effectively preserves cross-modal complementarity and high-resolution detail modeling while significantly enhancing robustness under missing-modality conditions. Evaluated on the BraTS 2023 and 2024 datasets, UniME substantially outperforms existing methods, achieving more accurate brain tumor segmentation.
📝 Abstract
Multimodal MRI offers complementary information for brain tumor segmentation, but clinical scans often lack one or more modalities, which degrades segmentation performance. In this paper, we propose UniME (Uni-Encoder Meets Multi-Encoders), a two-stage heterogeneous method for brain tumor segmentation with missing modalities that reconciles the trade-offs among fine-grained structure capture, cross-modal complementarity modeling, and exploitation of available modalities. The idea is to decouple representation learning from segmentation via a two-stage heterogeneous architecture. Stage 1 pretrains a single ViT Uni-Encoder with masked image modeling to establish a unified representation robust to missing modalities. Stage 2 adds modality-specific CNN Multi-Encoders to extract high-resolution, multi-scale, fine-grained features. We fuse these features with the global representation to produce precise segmentations. Experiments on BraTS 2023 and BraTS 2024 show that UniME outperforms previous methods under incomplete multi-modal scenarios. The code is available at https://github.com/Hooorace-S/UniME