Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation

📅 2024-11-26

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address the severe performance degradation in multimodal segmentation when certain modalities are missing—caused by single-modality bias—this paper proposes the first any-modal segmentation framework, capable of processing arbitrary subsets of visual modalities. Methodologically, we introduce multi-scale feature-level cross-modal and unimodal distillation, coupled with modality-agnostic semantic distillation at the prediction layer, to disentangle modality-specific knowledge from teacher models and transfer shared semantic representations. Robust knowledge transfer is further enabled via parallel multimodal teacher learning. Evaluated on both synthetic and real-world multi-sensor benchmarks—including NuScenes and SemanticKITTI—our approach consistently outperforms existing state-of-the-art methods across all modality combinations, achieving high accuracy and exceptional robustness under partial modality availability. This work establishes a reliable foundation for segmentation in multi-sensor perception systems.

Technology Category

Application Category

📝 Abstract

Simultaneously using multimodal inputs from multiple sensors to train segmentors is intuitively advantageous but practically challenging. A key challenge is unimodal bias, where multimodal segmentors over rely on certain modalities, causing performance drops when others are missing, common in real world applications. To this end, we develop the first framework for learning robust segmentor that can handle any combinations of visual modalities. Specifically, we first introduce a parallel multimodal learning strategy for learning a strong teacher. The cross-modal and unimodal distillation is then achieved in the multi scale representation space by transferring the feature level knowledge from multimodal to anymodal segmentors, aiming at addressing the unimodal bias and avoiding over-reliance on specific modalities. Moreover, a prediction level modality agnostic semantic distillation is proposed to achieve semantic knowledge transferring for segmentation. Extensive experiments on both synthetic and real-world multi-sensor benchmarks demonstrate that our method achieves superior performance.

Problem

Research questions and friction points this paper is trying to address.

Addresses unimodal bias in multimodal segmentors

Enables robust segmentation with any modality combination

Reduces over-reliance on specific missing modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel multimodal learning for strong teacher

Multi-scale cross-modal unimodal distillation

Modality-agnostic semantic distillation for segmentation

🔎 Similar Papers

No similar papers found.