TerraGen: A Unified Multi-Task Layout Generation Framework for Remote Sensing Data Augmentation

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Existing remote sensing generative data augmentation methods suffer from task isolation and insufficient geospatial modeling, hindering multi-task synergy. To address this, we propose TerraGen—the first unified multi-task remote sensing layout-to-image generation framework. It employs a geospatial layout encoder and a multi-scale feature injection mechanism to enable structure-controllable image synthesis shared across detection, segmentation, and other downstream tasks. We introduce the first large-scale, multi-task remote sensing layout dataset and establish a standardized evaluation protocol. TerraGen adopts a layout-to-image generation paradigm enhanced by a mask-weighted loss, jointly optimizing global structural coherence and local detail fidelity. Experiments demonstrate that TerraGen achieves state-of-the-art performance in both synthetic image quality and downstream task accuracy—under both full-shot and few-shot settings—while significantly improving cross-task generalization capability.

Technology Category

Application Category

📝 Abstract

Remote sensing vision tasks require extensive labeled data across multiple, interconnected domains. However, current generative data augmentation frameworks are task-isolated, i.e., each vision task requires training an independent generative model, and ignores the modeling of geographical information and spatial constraints. To address these issues, we propose extbf{TerraGen}, a unified layout-to-image generation framework that enables flexible, spatially controllable synthesis of remote sensing imagery for various high-level vision tasks, e.g., detection, segmentation, and extraction. Specifically, TerraGen introduces a geographic-spatial layout encoder that unifies bounding box and segmentation mask inputs, combined with a multi-scale injection scheme and mask-weighted loss to explicitly encode spatial constraints, from global structures to fine details. Also, we construct the first large-scale multi-task remote sensing layout generation dataset containing 45k images and establish a standardized evaluation protocol for this task. Experimental results show that our TerraGen can achieve the best generation image quality across diverse tasks. Additionally, TerraGen can be used as a universal data-augmentation generator, enhancing downstream task performance significantly and demonstrating robust cross-task generalisation in both full-data and few-shot scenarios.

Problem

Research questions and friction points this paper is trying to address.

Unified framework generates multi-task remote sensing layouts

Addresses task-isolated models lacking geographical spatial constraints

Enables controllable data augmentation for detection and segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified layout-to-image generation for multiple vision tasks

Geographic-spatial encoder with multi-scale injection scheme

First large-scale multi-task remote sensing layout dataset

🔎 Similar Papers

Data Augmentation in Earth Observation: A Diffusion Model Approach

2024-06-10InformationCitations: 1

Evaluating the Efficacy of Cut-and-Paste Data Augmentation in Semantic Segmentation for Satellite Imagery

2024-04-08IEEE International Geoscience and Remote Sensing SymposiumCitations: 1

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)