π€ AI Summary
This work addresses the severe degradation of semantic integrity in optical remote sensing imagery caused by cloud occlusion and the limitations of existing optical-SAR fusion methods in balancing global modeling efficiency with cross-modal fusion reliability. To this end, the authors propose the EDC framework, which employs a three-stream encoder with carrier tokens for lightweight global context modeling, introduces a Difference-Conditioned Hybrid Fusion (DCHF) mechanism to selectively suppress unreliable regions, and incorporates a teacher-guided cloud removal auxiliary branch to enhance semantic consistency under occlusion. The method pioneers a difference-conditioned cross-modal fusion strategy that effectively curbs cloud-induced noise propagation while reducing model complexity. Experiments demonstrate consistent improvements, achieving mIoU gains of 0.56% and 0.88% on the M3M-CR and WHU-OPT-SAR datasets, respectively, alongside a 46.7% reduction in parameters and a 1.98Γ acceleration in inference speed.
π Abstract
Cloud occlusion severely degrades the semantic integrity of optical remote sensing imagery. While incorporating Synthetic Aperture Radar (SAR) provides complementary observations, achieving efficient global modeling and reliable cross-modal fusion under cloud interference remains challenging. Existing methods rely on dense global attention to capture long-range dependencies, yet such aggregation indiscriminately propagates cloud-induced noise. Improving robustness typically entails enlarging model capacity, which further increases computational overhead. Given the large-scale and high-resolution nature of remote sensing applications, such computational demands hinder practical deployment, leading to an efficiency-reliability trade-off. To address this dilemma, we propose EDC, an efficiency-oriented and discrepancy-conditioned optical-SAR semantic segmentation framework. A tri-stream encoder with Carrier Tokens enables compact global context modeling with reduced complexity. To prevent noise contamination, we introduce a Discrepancy-Conditioned Hybrid Fusion (DCHF) mechanism that selectively suppresses unreliable regions during global aggregation. In addition, an auxiliary cloud removal branch with teacher-guided distillation enhances semantic consistency under occlusion. Extensive experiments demonstrate that EDC achieves superior accuracy and efficiency, improving mIoU by 0.56\% and 0.88\% on M3M-CR and WHU-OPT-SAR, respectively, while reducing the number of parameters by 46.7\% and accelerating inference by 1.98$\times$. Our implementation is available at https://github.com/mengcx0209/EDC.