CrossWeaver: Cross-modal Weaving for Arbitrary-Modality Semantic Segmentation

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal semantic segmentation methods struggle to efficiently coordinate complementary information across arbitrary modality combinations while often compromising modality-specific characteristics. To address this, this work proposes CrossWeaver, a novel framework that employs Modality Interaction Blocks (MIBs) to enable selective, reliability-aware cross-modal interactions. Additionally, a lightweight Stitch Alignment Fusion (SAF) module is introduced to aggregate and enhance multimodal features. CrossWeaver supports any combination of input modalities, significantly improving model generalization with minimal parameter overhead. Extensive experiments demonstrate that CrossWeaver achieves state-of-the-art performance on multiple benchmarks and exhibits strong generalization capabilities on unseen modality combinations.
📝 Abstract
Multimodal semantic segmentation has shown great potential in leveraging complementary information across diverse sensing modalities. However, existing approaches often rely on carefully designed fusion strategies that either use modality-specific adaptations or rely on loosely coupled interactions, thereby limiting flexibility and resulting in less effective cross-modal coordination. Moreover, these methods often struggle to balance efficient information exchange with preserving the unique characteristics of each modality across different modality combinations. To address these challenges, we propose CrossWeaver, a simple yet effective multimodal fusion framework for arbitrary-modality semantic segmentation. Its core is a Modality Interaction Block (MIB), which enables selective and reliability-aware cross-modal interaction within the encoder, while a lightweight Seam-Aligned Fusion (SAF) module further aggregates the enhanced features. Extensive experiments on multiple multimodal semantic segmentation benchmarks demonstrate that our framework achieves state-of-the-art performance with minimal additional parameters and strong generalization to unseen modality combinations.
Problem

Research questions and friction points this paper is trying to address.

multimodal semantic segmentation
cross-modal interaction
arbitrary-modality
modality fusion
feature coordination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal fusion
arbitrary-modality segmentation
Modality Interaction Block
Seam-Aligned Fusion
multimodal semantic segmentation
🔎 Similar Papers
No similar papers found.
Z
Zelin Zhang
The University of Sydney
K
Kedi Li
The University of Sydney
H
Huiqi Liang
The University of Sydney
T
Tao Zhang
University of Technology Sydney
Chuanzhi Xu
Chuanzhi Xu
Student, The University of Sydney
Neuromorphic VisionHigh-level VisionComputational Aesthetics