UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Multimodal image segmentation suffers significant performance degradation under missing or corrupted modalities; existing approaches require dedicated models for each modality combination, incurring high deployment costs and poor generalizability. This paper proposes the unified Modality-Relaxed Segmentation Framework (MRSF), the first to achieve cross-modal adaptive compensation at the input, feature, and output levels. Key innovations include: (i) a hierarchical self-supervised compensation mechanism integrating hybrid masked reconstruction and modality-invariant contrastive learning; (ii) a lightweight reverse-attention adapter; and (iii) a hybrid consistency fine-tuning strategy. MRSF enables robust segmentation over arbitrary modality subsets using a single model. Extensive experiments on brain tumor segmentation, RGB-D semantic segmentation, and salient object detection demonstrate consistent superiority over state-of-the-art methods, significantly improving segmentation accuracy under modality absence while enhancing deployment flexibility.

Technology Category

Application Category

📝 Abstract

Multi-modal image segmentation faces real-world deployment challenges from incomplete/corrupted modalities degrading performance. While existing methods address training-inference modality gaps via specialized per-combination models, they introduce high deployment costs by requiring exhaustive model subsets and model-modality matching. In this work, we propose a unified modality-relax segmentation network (UniMRSeg) through hierarchical self-supervised compensation (HSSC). Our approach hierarchically bridges representation gaps between complete and incomplete modalities across input, feature and output levels. % First, we adopt modality reconstruction with the hybrid shuffled-masking augmentation, encouraging the model to learn the intrinsic modality characteristics and generate meaningful representations for missing modalities through cross-modal fusion. % Next, modality-invariant contrastive learning implicitly compensates the feature space distance among incomplete-complete modality pairs. Furthermore, the proposed lightweight reverse attention adapter explicitly compensates for the weak perceptual semantics in the frozen encoder. Last, UniMRSeg is fine-tuned under the hybrid consistency constraint to ensure stable prediction under all modality combinations without large performance fluctuations. Without bells and whistles, UniMRSeg significantly outperforms the state-of-the-art methods under diverse missing modality scenarios on MRI-based brain tumor segmentation, RGB-D semantic segmentation, RGB-D/T salient object segmentation. The code will be released at https://github.com/Xiaoqi-Zhao-DLUT/UniMRSeg.

Problem

Research questions and friction points this paper is trying to address.

Addresses performance degradation from incomplete/corrupted multi-modal images

Eliminates need for specialized models per modality combination

Compensates representation gaps across input, feature and output levels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical self-supervised compensation across input levels

Modality-invariant contrastive learning for feature compensation

Lightweight reverse attention adapter for semantic compensation

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)