DIS2: Disentanglement Meets Distillation with Classwise Attention for Robust Remote Sensing Segmentation under Missing Modalities

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the significant performance degradation in remote sensing multimodal segmentation under missing modalities, a challenge exacerbated by the heterogeneity and scale discrepancies inherent in remote sensing data. To tackle this issue, the authors propose DIS2, a novel framework that integrates decoupled learning with knowledge distillation (DLKD) and introduces a class-level feature learning module (CFLM). CFLM leverages class-aware attention mechanisms and hierarchical multi-resolution fusion to enable adaptive compensation for missing modalities and robust semantic alignment. Extensive experiments demonstrate that DIS2 substantially outperforms existing methods across multiple remote sensing benchmarks, achieving superior segmentation robustness and accuracy in scenarios with incomplete modalities.

Technology Category

Application Category

📝 Abstract

The efficacy of multimodal learning in remote sensing (RS) is severely undermined by missing modalities. The challenge is exacerbated by the RS highly heterogeneous data and huge scale variation. Consequently, paradigms proven effective in other domains often fail when confronted with these unique data characteristics. Conventional disentanglement learning, which relies on significant feature overlap between modalities (modality-invariant), is insufficient for this heterogeneity. Similarly, knowledge distillation becomes an ill-posed mimicry task where a student fails to focus on the necessary compensatory knowledge, leaving the semantic gap unaddressed. Our work is therefore built upon three pillars uniquely designed for RS: (1) principled missing information compensation, (2) class-specific modality contribution, and (3) multi-resolution feature importance. We propose a novel method DIS2, a new paradigm shifting from modality-shared feature dependence and untargeted imitation to active, guided missing features compensation. Its core novelty lies in a reformulated synergy between disentanglement learning and knowledge distillation, termed DLKD. Compensatory features are explicitly captured which, when fused with the features of the available modality, approximate the ideal fused representation of the full-modality case. To address the class-specific challenge, our Classwise Feature Learning Module (CFLM) adaptively learn discriminative evidence for each target depending on signal availability. Both DLKD and CFLM are supported by a hierarchical hybrid fusion (HF) structure using features across resolutions to strengthen prediction. Extensive experiments validate that our proposed approach significantly outperforms state-of-the-art methods across benchmarks.

Problem

Research questions and friction points this paper is trying to address.

missing modalities

remote sensing segmentation

modality heterogeneity

semantic gap

multimodal learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentanglement Learning

Knowledge Distillation

Classwise Attention