Lean Learning Beyond Clouds: Efficient Discrepancy-Conditioned Optical-SAR Fusion for Semantic Segmentation

πŸ“… 2026-03-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

187K/year
πŸ€– AI Summary
This work addresses the severe degradation of semantic integrity in optical remote sensing imagery caused by cloud occlusion and the limitations of existing optical-SAR fusion methods in balancing global modeling efficiency with cross-modal fusion reliability. To this end, the authors propose the EDC framework, which employs a three-stream encoder with carrier tokens for lightweight global context modeling, introduces a Difference-Conditioned Hybrid Fusion (DCHF) mechanism to selectively suppress unreliable regions, and incorporates a teacher-guided cloud removal auxiliary branch to enhance semantic consistency under occlusion. The method pioneers a difference-conditioned cross-modal fusion strategy that effectively curbs cloud-induced noise propagation while reducing model complexity. Experiments demonstrate consistent improvements, achieving mIoU gains of 0.56% and 0.88% on the M3M-CR and WHU-OPT-SAR datasets, respectively, alongside a 46.7% reduction in parameters and a 1.98Γ— acceleration in inference speed.

Technology Category

Application Category

πŸ“ Abstract
Cloud occlusion severely degrades the semantic integrity of optical remote sensing imagery. While incorporating Synthetic Aperture Radar (SAR) provides complementary observations, achieving efficient global modeling and reliable cross-modal fusion under cloud interference remains challenging. Existing methods rely on dense global attention to capture long-range dependencies, yet such aggregation indiscriminately propagates cloud-induced noise. Improving robustness typically entails enlarging model capacity, which further increases computational overhead. Given the large-scale and high-resolution nature of remote sensing applications, such computational demands hinder practical deployment, leading to an efficiency-reliability trade-off. To address this dilemma, we propose EDC, an efficiency-oriented and discrepancy-conditioned optical-SAR semantic segmentation framework. A tri-stream encoder with Carrier Tokens enables compact global context modeling with reduced complexity. To prevent noise contamination, we introduce a Discrepancy-Conditioned Hybrid Fusion (DCHF) mechanism that selectively suppresses unreliable regions during global aggregation. In addition, an auxiliary cloud removal branch with teacher-guided distillation enhances semantic consistency under occlusion. Extensive experiments demonstrate that EDC achieves superior accuracy and efficiency, improving mIoU by 0.56\% and 0.88\% on M3M-CR and WHU-OPT-SAR, respectively, while reducing the number of parameters by 46.7\% and accelerating inference by 1.98$\times$. Our implementation is available at https://github.com/mengcx0209/EDC.
Problem

Research questions and friction points this paper is trying to address.

cloud occlusion
optical-SAR fusion
semantic segmentation
efficiency-reliability trade-off
cross-modal fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrepancy-Conditioned Fusion
Carrier Tokens
Optical-SAR Fusion
Efficient Semantic Segmentation
Cloud Occlusion Robustness