Lean Learning Beyond Clouds: Efficient Discrepancy-Conditioned Optical-SAR Fusion for Semantic Segmentation

πŸ“… 2026-03-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the severe degradation of semantic integrity in optical remote sensing imagery caused by cloud occlusion and the limitations of existing optical-SAR fusion methods in balancing global modeling efficiency with cross-modal fusion reliability. To this end, the authors propose the EDC framework, which employs a three-stream encoder with carrier tokens for lightweight global context modeling, introduces a Difference-Conditioned Hybrid Fusion (DCHF) mechanism to selectively suppress unreliable regions, and incorporates a teacher-guided cloud removal auxiliary branch to enhance semantic consistency under occlusion. The method pioneers a difference-conditioned cross-modal fusion strategy that effectively curbs cloud-induced noise propagation while reducing model complexity. Experiments demonstrate consistent improvements, achieving mIoU gains of 0.56% and 0.88% on the M3M-CR and WHU-OPT-SAR datasets, respectively, alongside a 46.7% reduction in parameters and a 1.98Γ— acceleration in inference speed.

Technology Category

Application Category

πŸ“ Abstract
Cloud occlusion severely degrades the semantic integrity of optical remote sensing imagery. While incorporating Synthetic Aperture Radar (SAR) provides complementary observations, achieving efficient global modeling and reliable cross-modal fusion under cloud interference remains challenging. Existing methods rely on dense global attention to capture long-range dependencies, yet such aggregation indiscriminately propagates cloud-induced noise. Improving robustness typically entails enlarging model capacity, which further increases computational overhead. Given the large-scale and high-resolution nature of remote sensing applications, such computational demands hinder practical deployment, leading to an efficiency-reliability trade-off. To address this dilemma, we propose EDC, an efficiency-oriented and discrepancy-conditioned optical-SAR semantic segmentation framework. A tri-stream encoder with Carrier Tokens enables compact global context modeling with reduced complexity. To prevent noise contamination, we introduce a Discrepancy-Conditioned Hybrid Fusion (DCHF) mechanism that selectively suppresses unreliable regions during global aggregation. In addition, an auxiliary cloud removal branch with teacher-guided distillation enhances semantic consistency under occlusion. Extensive experiments demonstrate that EDC achieves superior accuracy and efficiency, improving mIoU by 0.56\% and 0.88\% on M3M-CR and WHU-OPT-SAR, respectively, while reducing the number of parameters by 46.7\% and accelerating inference by 1.98$\times$. Our implementation is available at https://github.com/mengcx0209/EDC.
Problem

Research questions and friction points this paper is trying to address.

cloud occlusion
optical-SAR fusion
semantic segmentation
efficiency-reliability trade-off
cross-modal fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrepancy-Conditioned Fusion
Carrier Tokens
Optical-SAR Fusion
Efficient Semantic Segmentation
Cloud Occlusion Robustness
πŸ”Ž Similar Papers
No similar papers found.
C
Chenxing Meng
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Wuzhou Quan
Wuzhou Quan
南京θˆͺη©Ίθˆͺ倩倧学
Computer VisionPattern RecognitionRemote Sensing
Y
Yingjie Cai
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
L
Liqun Cao
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
L
Liyan Zhang
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Mingqiang Wei
Mingqiang Wei
Professor at Nanjing University of Aeronautics and Astronautics
3D VisionMultimodal FusionComputer GraphicsDeep Geometry LearningCAD