SegRGB-X: General RGB-X Semantic Segmentation Model

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the efficiency and generalization bottlenecks in semantic segmentation across arbitrary sensor modalities, which stem from significant modality discrepancies and the need for repeated development of modality-specific methods. To overcome these challenges, the authors propose a unified multimodal semantic segmentation framework that leverages a modality-aware CLIP architecture to achieve cross-modal semantic alignment. The framework incorporates a modality-aligned embedding mechanism to extract fine-grained features and integrates a domain-specific optimization module that dynamically adapts representations. Built upon a LoRA-finetuned CLIP backbone, the model enables end-to-end joint training across diverse modalities—including RGB, event-based, thermal, and depth data. Evaluated on five heterogeneous modality datasets, the approach achieves a state-of-the-art mean Intersection-over-Union (mIoU) of 65.03%, substantially outperforming existing specialized methods.

Technology Category

Application Category

📝 Abstract

Semantic segmentation across arbitrary sensor modalities faces significant challenges due to diverse sensor characteristics, and the traditional configurations for this task result in redundant development efforts. We address these challenges by introducing a universal arbitrary-modal semantic segmentation framework that unifies segmentation across multiple modalities. Our approach features three key innovations: (1) the Modality-aware CLIP (MA-CLIP), which provides modality-specific scene understanding guidance through LoRA fine-tuning; (2) Modality-aligned Embeddings for capturing fine-grained features; and (3) the Domain-specific Refinement Module (DSRM) for dynamic feature adjustment. Evaluated on five diverse datasets with different complementary modalities (event, thermal, depth, polarization, and light field), our model surpasses specialized multi-modal methods and achieves state-of-the-art performance with a mIoU of 65.03%. The codes will be released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

semantic segmentation

multi-modal

arbitrary sensor modalities

modality generalization

RGB-X

Innovation

Methods, ideas, or system contributions that make the work stand out.

arbitrary-modal segmentation

Modality-aware CLIP

Modality-aligned Embeddings