TerraGen: A Unified Multi-Task Layout Generation Framework for Remote Sensing Data Augmentation

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

168K/year
🤖 AI Summary
Existing remote sensing generative data augmentation methods suffer from task isolation and insufficient geospatial modeling, hindering multi-task synergy. To address this, we propose TerraGen—the first unified multi-task remote sensing layout-to-image generation framework. It employs a geospatial layout encoder and a multi-scale feature injection mechanism to enable structure-controllable image synthesis shared across detection, segmentation, and other downstream tasks. We introduce the first large-scale, multi-task remote sensing layout dataset and establish a standardized evaluation protocol. TerraGen adopts a layout-to-image generation paradigm enhanced by a mask-weighted loss, jointly optimizing global structural coherence and local detail fidelity. Experiments demonstrate that TerraGen achieves state-of-the-art performance in both synthetic image quality and downstream task accuracy—under both full-shot and few-shot settings—while significantly improving cross-task generalization capability.

Technology Category

Application Category

📝 Abstract
Remote sensing vision tasks require extensive labeled data across multiple, interconnected domains. However, current generative data augmentation frameworks are task-isolated, i.e., each vision task requires training an independent generative model, and ignores the modeling of geographical information and spatial constraints. To address these issues, we propose extbf{TerraGen}, a unified layout-to-image generation framework that enables flexible, spatially controllable synthesis of remote sensing imagery for various high-level vision tasks, e.g., detection, segmentation, and extraction. Specifically, TerraGen introduces a geographic-spatial layout encoder that unifies bounding box and segmentation mask inputs, combined with a multi-scale injection scheme and mask-weighted loss to explicitly encode spatial constraints, from global structures to fine details. Also, we construct the first large-scale multi-task remote sensing layout generation dataset containing 45k images and establish a standardized evaluation protocol for this task. Experimental results show that our TerraGen can achieve the best generation image quality across diverse tasks. Additionally, TerraGen can be used as a universal data-augmentation generator, enhancing downstream task performance significantly and demonstrating robust cross-task generalisation in both full-data and few-shot scenarios.
Problem

Research questions and friction points this paper is trying to address.

Unified framework generates multi-task remote sensing layouts
Addresses task-isolated models lacking geographical spatial constraints
Enables controllable data augmentation for detection and segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified layout-to-image generation for multiple vision tasks
Geographic-spatial encoder with multi-scale injection scheme
First large-scale multi-task remote sensing layout dataset
🔎 Similar Papers