Task-Driven Prompt Learning: A Joint Framework for Multi-modal Cloud Removal and Segmentation

📅 2026-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Optical remote sensing imagery is frequently obstructed by cloud cover, and existing cloud removal methods often over-smooth textures and boundaries, hindering downstream semantic analysis. To address this, this work proposes the TDP-CR framework, which jointly optimizes multimodal cloud removal and land-cover segmentation in a task-driven manner to enhance the quality of analysis-ready data. The key innovations include a learnable degradation prompt-guided fusion mechanism (PGF) that adaptively integrates SAR and optical images by leveraging both global channel context and local spatial bias, along with a parameter-efficient two-stage decoupled training strategy. Evaluated on the LuojiaSET-OSFCR dataset, the proposed method achieves a 0.18 dB PSNR gain and a 1.4% improvement in mIoU over state-of-the-art baselines while using only 15% of their parameter count, significantly boosting downstream analytical performance.

Technology Category

Application Category

📝 Abstract
Optical remote sensing imagery is indispensable for Earth observation, yet persistent cloud occlusion limits its downstream utility. Most cloud removal (CR) methods are optimized for low-level fidelity and can over-smooth textures and boundaries that are critical for analysis-ready data (ARD), leading to a mismatch between visually plausible restoration and semantic utility. To bridge this gap, we propose TDP-CR, a task-driven multimodal framework that jointly performs cloud removal and land-cover segmentation. Central to our approach is a Prompt-Guided Fusion (PGF) mechanism, which utilizes a learnable degradation prompt to encode cloud thickness and spatial uncertainty. By combining global channel context with local prompt-conditioned spatial bias, PGF adaptively integrates Synthetic Aperture Radar (SAR) information only where optical data is corrupted. We further introduce a parameter-efficient two-phase training strategy that decouples reconstruction and semantic representation learning. Experiments on the LuojiaSET-OSFCR dataset demonstrate the superiority of our framework: TDP-CR surpasses heavy state-of-the-art baselines by 0.18 dB in PSNR while using only 15\% of the parameters, and achieves a 1.4\% improvement in mIoU consistently against multi-task competitors, effectively delivering analysis-ready data.
Problem

Research questions and friction points this paper is trying to address.

cloud removal
analysis-ready data
semantic utility
remote sensing
multi-modal fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt-Guided Fusion
Task-Driven Learning
Multimodal Cloud Removal
Analysis-Ready Data
Parameter-Efficient Training
🔎 Similar Papers
No similar papers found.
Z
Zaiyan Zhang
Wuhan University
J
Jie Li
Wuhan University
S
Shaowei Shi
Wuhan University
Qiangqiang Yuan
Qiangqiang Yuan
Full Professor in School of Geodesy and Geomatics, Wuhan university
Remote sensingData fusionImage processing