🤖 AI Summary
Existing controllable remote sensing image generation methods rely on segmentation or edge priors, which struggle to realistically model complex terrain and atmospheric phenomena, often yielding results lacking fine details and photorealism. To address this limitation, this work proposes the D2-CDIG framework, which introduces digital elevation models (DEMs) and cloud/fog information as dual priors into a diffusion model for the first time. The framework employs a dual-branch architecture to decouple surface and atmospheric generation pathways and incorporates a hierarchical control signal injection mechanism along with an adjustable cloud/fog slider, enabling precise manipulation of terrain structure and cloud distribution. This approach substantially enhances the realism, textural richness, and controllability of generated images, offering high-quality synthetic data to support large-scale remote sensing foundation models and downstream applications.
📝 Abstract
Remote sensing image generation provides a reliable data foundation for remote sensing large models and downstream tasks. However, existing controllable remote sensing image generation methods typically rely on traditional techniques such as segmentation and edge detection, which do not fully leverage terrain or atmospheric conditions. As a result, the generated images often lack accuracy and naturalness when dealing with complex terrains and atmospheric phenomena. In this paper, we propose a novel remote sensing image generation framework, D2-CDIG, which integrates diffusion models with a dual-prior control mechanism. By incorporating both Digital Elevation Model (DEM) and cloud-fog information as dual prior knowledge, D2-CDIG precisely controls ground features and atmospheric phenomena within the generated images. Specifically, D2-CDIG decouples the terrain and atmospheric generation processes through independent control of ground and atmospheric branches. Additionally, a refined cloud-fog slider is introduced to flexibly adjust cloud thickness and distribution. During training, ground and atmospheric control signals are injected in layers to ensure a seamless transition within the images. Compared to traditional methods based on segmentation or edge detection, D2-CDIG shows significant improvements in image quality, detail richness, and realism. D2-CDIG offers a flexible and precise solution for remote sensing image generation, providing high-quality data for training large remote sensing models and downstream tasks.