Physics-Guided VLM Priors for All-Cloud Removal

📅 2026-03-07

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge of mixed cloud contamination—comprising both thin and thick clouds—in optical remote sensing imagery, which induces radiometric distortion and surface occlusion. Conventional approaches rely on explicit cloud-type segmentation, often leading to error propagation and discontinuous outputs. To overcome these limitations, this study proposes the first unified cloud removal framework that integrates semantic priors from a vision-language model (VLM) with a physical imaging model. By translating VLM outputs into physically meaningful scattering parameters and hallucination confidence maps, and introducing a confidence-guided soft-gating mechanism, the method adaptively fuses physics-based inversion with temporal reference reconstruction—eliminating the need for explicit cloud boundary detection. Evaluated on real Sentinel-2 data, the approach achieves high-fidelity, hallucination-free cloud removal, striking a superior balance between quantitative metrics and content preservation compared to existing methods.

Technology Category

Application Category

📝 Abstract

Cloud removal is a fundamental challenge in optical remote sensing due to the heterogeneous degradation. Thin clouds distort radiometry via partial transmission, while thick clouds occlude the surface. Existing pipelines separate thin-cloud correction from thick-cloud reconstruction, requiring explicit cloud-type decisions and often leading to error accumulation and discontinuities in mixed-cloud scenes. Therefore, a novel approach named Physical-VLM All-Cloud Removal (PhyVLM-CR) that integrates the semantic capability of Vision-Language Model (VLM) into a physical restoration model, achieving high-fidelity unified cloud removal. Specifically, the cognitive prior from a VLM (e.g., Qwen) is transformed into physical scattering parameters and a hallucination confidence map. Leveraging this confidence map as a continuous soft gate, our method achieves a unified restoration via adaptive weighting: it prioritizes physical inversion in high-transmission regions to preserve radiometric fidelity, while seamlessly transitioning to temporal reference reconstruction in low-confidence occluded areas. This mechanism eliminates the need for explicit boundary delineation, ensuring a coherent removal across heterogeneous cloud covers. Experiments on real-world Sentinel-2 surface reflectance imagery confirm that our approach achieves a remarkable balance between cloud removal and content preservation, delivering hallucination-free results with substantially improved quantitative accuracy compared to existing methods.

Problem

Research questions and friction points this paper is trying to address.

cloud removal

optical remote sensing

heterogeneous degradation

thin clouds

thick clouds

Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-Guided

Vision-Language Model

All-Cloud Removal