Cross-Sensor Self-Supervised Training and Alignment for Remote Sensing

📅 2024-05-16
🏛️ IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of inconsistent feature representations across remote sensing sensors (e.g., Sentinel-2 and aerial imagery) and limited high-resolution annotated data—leading to poor cross-resolution generalization—this paper proposes X-STARS, a cross-sensor self-supervised training and alignment framework. Its core innovation is the first multi-sensor alignment dense loss, which achieves cross-resolution and cross-platform feature alignment via contrastive image-patch matching. X-STARS supports both from-scratch pretraining and continual pretraining paradigms. Evaluated on our newly constructed Cities-France multi-sensor dataset, X-STARS consistently outperforms state-of-the-art methods across seven downstream classification and segmentation tasks. Moreover, it achieves comparable performance using 30–50% fewer annotated samples, significantly reducing annotation burden while enhancing model transferability across heterogeneous remote sensing modalities.

Technology Category

Application Category

📝 Abstract
Large-scale “foundation models” have gained traction as a way to leverage the vast amounts of unlabeled remote sensing data collected every day. However, due to the multiplicity of Earth Observation (EO) satellites, these models should learn “sensor agnostic” representations, that generalize across sensor characteristics with minimal fine-tuning. This is complicated by data availability, as low-resolution imagery, such as Sentinel-2 and Landsat-8 data, are available in large amounts, while very high-resolution aerial or satellite data is less common. To better leverage multisensor data, we introduce cross-sensor self-supervised training and alignment for remote sensing (X-STARS). We design a self-supervised training loss, the multi-sensor alignment dense loss, to align representations across sensors, even with vastly different resolutions, through a contrastive patch-wise mechanism. Our X-STARS can be applied to train models from scratch, or to adapt large models pretrained on e.g. low-resolution EO data to new high-resolution sensors, in a continual pretraining framework. We collect and release multi-sensors cities-France, a new multisensor dataset, on which we train our X-STARS models, then evaluated on seven downstream classification and segmentation tasks. We demonstrate that X-STARS outperforms the state-of-the-art with less data across various conditions of data availability and resolutions.
Problem

Research questions and friction points this paper is trying to address.

Develop sensor-agnostic models for remote sensing data
Align representations across varying resolution sensors
Improve model performance with limited high-resolution data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-sensor self-supervised training for remote sensing
Multi-Sensor Alignment Dense loss (MSAD)
Continual pretraining framework for multi-resolution adaptation
🔎 Similar Papers
No similar papers found.