RSEdit: Text-Guided Image Editing for Remote Sensing

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the limitations of general-purpose text-guided image editors when applied to remote sensing imagery, where they often introduce artifacts, hallucinated objects, and violate orthorectified geometric constraints, failing to meet physical and spatial consistency requirements. To overcome these challenges, we propose the first text-guided editing framework specifically designed for remote sensing images. By integrating channel-wise concatenation and contextual token fusion, we adapt pre-trained diffusion models (U-Net/DiT) into instruction-following remote sensing editors. Our approach further incorporates bitemporal structural priors and spatial constraints to compensate for the lack of domain-specific knowledge and conditional modeling in generic models. Trained on over 60,000 semantically rich bitemporal remote sensing image pairs, the method significantly outperforms both general-purpose and commercial baselines across diverse scenarios—including disaster impact assessment, urban expansion, and seasonal changes—demonstrating strong generalization and serving as a reliable data generation engine for downstream analysis.

Technology Category

Application Category

📝 Abstract

General-domain text-guided image editors achieve strong photorealism but introduce artifacts, hallucinate objects, and break the orthographic constraints of remote sensing (RS) imagery. We trace this gap to two high-level causes: (i) limited RS world knowledge in pre-trained models, and (ii) conditioning schemes that misalign with the bi-temporal structure and spatial priors of Earth observation data. We present RSEdit, a unified framework that adapts pretrained text-to-image diffusion models - both U-Net and DiT - into instruction-following RS editors via channel concatenation and in-context token concatenation. Trained on over 60,000 semantically rich bi-temporal remote sensing image pairs, RSEdit learns precise, physically coherent edits while preserving geospatial content. Experiments show clear gains over general and commercial baselines, demonstrating strong generalizability across diverse scenarios including disaster impacts, urban growth, and seasonal shifts, positioning RSEdit as a robust data engine for downstream analysis. We will release code, pretrained models, evaluation protocols, training logs, and generated results for full reproducibility. Code: https://github.com/Bili-Sakura/RSEdit-Preview

Problem

Research questions and friction points this paper is trying to address.

remote sensing

text-guided image editing

photorealism artifacts

orthographic constraints

geospatial coherence

Innovation

Methods, ideas, or system contributions that make the work stand out.

text-guided image editing

remote sensing

diffusion models