🤖 AI Summary
Earth observation (EO) imagery suffers from semantic scarcity in critical dimensions—such as snow cover, flooding, urbanization, and wildfires—due to atmospheric interference, seasonal variability, and spatial coverage limitations; conventional geometric augmentation fails to enrich semantic diversity, thereby constraining AI model performance. To address this, we propose a four-stage diffusion-based augmentation framework: (1) generating semantics-controllable instructions via meta-prompting; (2) performing fine-grained multimodal annotation using vision-language models (e.g., CLIP, Flamingo); (3) domain-specific fine-tuning of diffusion models tailored to EO data; and (4) establishing a synthetic-data generation–evaluation iterative loop. This work introduces the first paradigm integrating instruction-guided generation, multimodal annotation, and domain-adaptive diffusion modeling to overcome semantic impoverishment in EO. Evaluated across four benchmark tasks, our approach consistently outperforms conventional methods, significantly improving accuracy and generalization in downstream applications—including change detection and disaster identification.
📝 Abstract
High-quality Earth Observation (EO) imagery is essential for accurate analysis and informed decision making across sectors. However, data scarcity caused by atmospheric conditions, seasonal variations, and limited geographical coverage hinders the effective application of Artificial Intelligence (AI) in EO. Traditional data augmentation techniques, which rely on basic parameterized image transformations, often fail to introduce sufficient diversity across key semantic axes. These axes include natural changes such as snow and floods, human impacts like urbanization and roads, and disasters such as wildfires and storms, which limits the accuracy of AI models in EO applications. To address this, we propose a four-stage data augmentation approach that integrates diffusion models to enhance semantic diversity. Our method employs meta-prompts for instruction generation, vision–language models for rich captioning, EO-specific diffusion model fine-tuning, and iterative data augmentation. Extensive experiments using four augmentation techniques demonstrate that our approach consistently outperforms established methods, generating semantically diverse EO images and improving AI model performance.