RSGen: Enhancing Layout-Driven Remote Sensing Image Generation with Diverse Edge Guidance

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the limitations of existing layout-driven remote sensing image generation methods, which often lack fine-grained control and fail to strictly adhere to bounding box constraints. To overcome these challenges, the authors propose RSGen, a novel framework that introduces a plug-and-play edge-guidance mechanism. By integrating image-to-image translation with diffusion models, RSGen leverages diverse synthetic edge maps—derived from training samples—as conditional inputs and employs a two-stage progressive strategy to enhance pixel-level control accuracy. The approach significantly improves detail fidelity and bounding box consistency while preserving overall layout structure. Experimental results on the DOTA dataset demonstrate that, when combined with the CC-Diff model, RSGen boosts YOLOScore by 9.8 and 12.0 in mAP50 and mAP50-95, respectively, and yields a 1.6-point improvement in mAP for downstream detection tasks.

Technology Category

Application Category

📝 Abstract

Diffusion models have significantly mitigated the impact of annotated data scarcity in remote sensing (RS). Although recent approaches have successfully harnessed these models to enable diverse and controllable Layout-to-Image (L2I) synthesis, they still suffer from limited fine-grained control and fail to strictly adhere to bounding box constraints. To address these limitations, we propose RSGen, a plug-and-play framework that leverages diverse edge guidance to enhance layout-driven RS image generation. Specifically, RSGen employs a progressive enhancement strategy: 1) it first enriches the diversity of edge maps composited from retrieved training instances via Image-to-Image generation; and 2) subsequently utilizes these diverse edge maps as conditioning for existing L2I models to enforce pixel-level control within bounding boxes, ensuring the generated instances strictly adhere to the layout. Extensive experiments across three baseline models demonstrate that RSGen significantly boosts the capabilities of existing L2I models. For instance, with CC-Diff on the DOTA dataset for oriented object detection, we achieve remarkable gains of +9.8/+12.0 in YOLOScore mAP50/mAP50-95 and +1.6 in mAP on the downstream detection task. Our code will be publicly available: https://github.com/D-Robotics-AI-Lab/RSGen

Problem

Research questions and friction points this paper is trying to address.

layout-driven generation

remote sensing image generation

bounding box constraints

fine-grained control

edge guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

edge guidance

layout-to-image synthesis

diffusion models