CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation

📅 2024-10-30
🏛️ arXiv.org
📈 Citations: 10
Influential: 0
📄 PDF
🤖 AI Summary
Remote sensing image semantic segmentation suffers from poor cross-domain generalization, with existing models typically constrained to specific unknown domains or excelling only in single-domain settings. Method: We propose the first geospatial vision foundation model explicitly designed for unknown-domain generalization. Our approach introduces a dual-path architecture comprising Earth-Style Data Injection and multi-task collaborative training, integrating a remote sensing–specific visual encoder with cross-domain consistency regularization. Contribution/Results: We establish RSDG—the first benchmark for remote sensing domain generalization—comprising 28 heterogeneous cross-domain configurations. Extensive experiments demonstrate that our model consistently outperforms state-of-the-art methods under diverse unknown conditions, including novel geographic locations, sensor types, spectral bands, and climatic regimes, thereby achieving robust, generalizable cross-domain semantic segmentation.

Technology Category

Application Category

📝 Abstract
The field of Remote Sensing Domain Generalization (RSDG) has emerged as a critical and valuable research frontier, focusing on developing models that generalize effectively across diverse scenarios. Despite the substantial domain gaps in RS images that are characterized by variabilities such as location, wavelength, and sensor type, research in this area remains underexplored: (1) Current cross-domain methods primarily focus on Domain Adaptation (DA), which adapts models to predefined domains rather than to unseen ones; (2) Few studies targeting the RSDG issue, especially for semantic segmentation tasks, where existing models are developed for specific unknown domains, struggling with issues of underfitting on other unknown scenarios; (3) Existing RS foundation models tend to prioritize in-domain performance over cross-domain generalization. To this end, we introduce the first vision foundation model for RSDG semantic segmentation, CrossEarth. CrossEarth demonstrates strong cross-domain generalization through a specially designed data-level Earth-Style Injection pipeline and a model-level Multi-Task Training pipeline. In addition, for the semantic segmentation task, we have curated an RSDG benchmark comprising 28 cross-domain settings across various regions, spectral bands, platforms, and climates, providing a comprehensive framework for testing the generalizability of future RSDG models. Extensive experiments on this benchmark demonstrate the superiority of CrossEarth over existing state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Addressing domain generalization in remote sensing semantic segmentation across diverse scenarios
Overcoming limitations of current domain adaptation methods for unseen domains
Improving cross-domain generalization beyond in-domain performance of existing models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Earth-Style Injection pipeline for data-level generalization
Multi-Task Training pipeline for model-level generalization
Comprehensive RSDG benchmark with 32 cross-domain settings
🔎 Similar Papers
No similar papers found.