Multimodal Diffusion Bridge with Attention-Based SAR Fusion for Satellite Image Cloud Removal

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency of sampling trajectories in diffusion-based cloud removal for optical satellite imagery and insufficient SAR-optical multimodal fusion, this paper proposes a cross-modal cloud removal framework based on Diffusion Bridges. Departing from the conventional paradigm of initializing sampling from pure Gaussian noise, our method directly models the distributional mapping between SAR and cloud-free optical images. We design a dual-branch multimodal encoder and an attention-driven cross-modal feature fusion module to enable conditional, efficient denoising sampling. Evaluated on the SEN12MS-CR dataset, our approach achieves state-of-the-art performance: it significantly improves cloud-region reconstruction quality and structural consistency while accelerating inference by 3.2× over standard diffusion models. The core contributions are the first introduction of the Diffusion Bridge architecture and a novel multimodal distribution bridging modeling paradigm, jointly optimizing fidelity, sampling efficiency, and generalization capability.

Technology Category

Application Category

📝 Abstract
Deep learning has achieved some success in addressing the challenge of cloud removal in optical satellite images, by fusing with synthetic aperture radar (SAR) images. Recently, diffusion models have emerged as powerful tools for cloud removal, delivering higher-quality estimation by sampling from cloud-free distributions, compared to earlier methods. However, diffusion models initiate sampling from pure Gaussian noise, which complicates the sampling trajectory and results in suboptimal performance. Also, current methods fall short in effectively fusing SAR and optical data. To address these limitations, we propose Diffusion Bridges for Cloud Removal, DB-CR, which directly bridges between the cloudy and cloud-free image distributions. In addition, we propose a novel multimodal diffusion bridge architecture with a two-branch backbone for multimodal image restoration, incorporating an efficient backbone and dedicated cross-modality fusion blocks to effectively extract and fuse features from synthetic aperture radar (SAR) and optical images. By formulating cloud removal as a diffusion-bridge problem and leveraging this tailored architecture, DB-CR achieves high-fidelity results while being computationally efficient. We evaluated DB-CR on the SEN12MS-CR cloud-removal dataset, demonstrating that it achieves state-of-the-art results.
Problem

Research questions and friction points this paper is trying to address.

Bridging cloudy and cloud-free satellite image distributions
Improving SAR and optical data fusion efficiency
Enhancing cloud removal accuracy with diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion bridge connects cloudy and cloud-free distributions
Two-branch backbone for multimodal image restoration
Efficient cross-modality fusion blocks for SAR and optical
🔎 Similar Papers
2024-08-15IEEE Geoscience and Remote Sensing LettersCitations: 1