MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing

📅 2024-12-18

🏛️ IEEE Transactions on Geoscience and Remote Sensing

📈 Citations: 1

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing remote sensing image generation models prioritize global layout synthesis but lack fine-grained, multi-class, multi-scale instance-level annotations required for object detection. To address this, we propose a detection-oriented structured generative framework. Our method introduces two novel components: (1) Isometric Instance Map (ISIM) encoding to represent spatially uniform instance distributions, and (2) Spatial Cross-Dependency Knowledge Graph (SCDKG) to model inter-regional semantic dependencies. We further design Structured Object Distribution Instructions (SODI) to jointly guide global layout and local instance placement. Built upon a diffusion-based architecture, the framework integrates ISIM instance representations, SCDKG region-wise semantic embeddings, SODI-enforced global constraints, and multi-scale supervision. Experiments demonstrate that generated images significantly improve geometric and semantic fidelity for dense, multi-class, multi-scale objects. When used for detector pretraining, our method achieves state-of-the-art performance on real-world remote sensing benchmarks.

Technology Category

Application Category

📝 Abstract

The rapid advancement of deep generative models (DGMs) has significantly advanced research in computer vision, providing a cost-effective alternative to acquiring vast quantities of expensive imagery. However, existing methods predominantly focus on synthesizing remote sensing (RS) images aligned with real images in a global layout view, which limits their applicability in RS image object detection (RSIOD) research. To address these challenges, we propose a multi-class and multi-scale object image generator based on DGMs, termed MMO-IG, designed to generate RS images with supervised object labels from global and local aspects simultaneously. Specifically, from the local view, MMO-IG encodes various RS instances using an iso-spacing instance map (ISIM). During the generation process, it decodes each instance region with iso-spacing value in ISIM-corresponding to both background and foreground instances-to produce RS images through the denoising process of diffusion models. Considering the complex interdependencies among MMOs, we construct a spatial-cross dependency knowledge graph (SCDKG). This ensures a realistic and reliable multidirectional distribution among MMOs for region embedding, thereby reducing the discrepancy between source and target domains. Besides, we propose a structured object distribution instruction (SODI) to guide the generation of synthesized RS image content from a global aspect with SCDKG-based ISIM together. Extensive experimental results demonstrate that our MMO-IG exhibits superior generation capabilities for RS images with dense MMO-supervised labels, and RS detectors pre-trained with MMO-IG show excellent performance on real-world datasets.

Problem

Research questions and friction points this paper is trying to address.

Generates multi-class, multi-scale remote sensing images.

Improves object detection in remote sensing imagery.

Reduces domain discrepancy with spatial-cross dependency knowledge graph.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-class, multi-scale object image generator

Iso-spacing instance map for local encoding

Spatial-cross dependency knowledge graph for realistic distribution

🔎 Similar Papers

Scaling Efficient Masked Image Modeling on Large Remote Sensing Dataset

2024-06-17Citations: 3

Bosch Group

Renningen, BW, DE

PhD - Effiziente Neuronale Repräsentation von Datensätzen

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)