Task-Driven Prompt Learning: A Joint Framework for Multi-modal Cloud Removal and Segmentation

📅 2026-01-17

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Optical remote sensing imagery is frequently obstructed by cloud cover, and existing cloud removal methods often over-smooth textures and boundaries, hindering downstream semantic analysis. To address this, this work proposes the TDP-CR framework, which jointly optimizes multimodal cloud removal and land-cover segmentation in a task-driven manner to enhance the quality of analysis-ready data. The key innovations include a learnable degradation prompt-guided fusion mechanism (PGF) that adaptively integrates SAR and optical images by leveraging both global channel context and local spatial bias, along with a parameter-efficient two-stage decoupled training strategy. Evaluated on the LuojiaSET-OSFCR dataset, the proposed method achieves a 0.18 dB PSNR gain and a 1.4% improvement in mIoU over state-of-the-art baselines while using only 15% of their parameter count, significantly boosting downstream analytical performance.

Technology Category

Application Category

📝 Abstract

Optical remote sensing imagery is indispensable for Earth observation, yet persistent cloud occlusion limits its downstream utility. Most cloud removal (CR) methods are optimized for low-level fidelity and can over-smooth textures and boundaries that are critical for analysis-ready data (ARD), leading to a mismatch between visually plausible restoration and semantic utility. To bridge this gap, we propose TDP-CR, a task-driven multimodal framework that jointly performs cloud removal and land-cover segmentation. Central to our approach is a Prompt-Guided Fusion (PGF) mechanism, which utilizes a learnable degradation prompt to encode cloud thickness and spatial uncertainty. By combining global channel context with local prompt-conditioned spatial bias, PGF adaptively integrates Synthetic Aperture Radar (SAR) information only where optical data is corrupted. We further introduce a parameter-efficient two-phase training strategy that decouples reconstruction and semantic representation learning. Experiments on the LuojiaSET-OSFCR dataset demonstrate the superiority of our framework: TDP-CR surpasses heavy state-of-the-art baselines by 0.18 dB in PSNR while using only 15\% of the parameters, and achieves a 1.4\% improvement in mIoU consistently against multi-task competitors, effectively delivering analysis-ready data.

Problem

Research questions and friction points this paper is trying to address.

cloud removal

analysis-ready data

semantic utility

remote sensing

multi-modal fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt-Guided Fusion

Task-Driven Learning

Multimodal Cloud Removal