SOMA-1M: A Large-Scale SAR-Optical Multi-resolution Alignment Dataset for Multi-Task Remote Sensing

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the limitations of existing remote sensing datasets—such as single-resolution coverage, insufficient scale, and low alignment accuracy—that hinder the development of multi-scale, multi-modal foundation models. We present the first large-scale dataset comprising over 1.3 million pixel-level precisely aligned SAR and optical image pairs, spanning resolutions from 0.5 m to 10 m and covering 12 representative land-cover classes. A coarse-to-fine matching framework is introduced to effectively resolve challenges posed by multi-modal projection distortions and massive-scale registration, integrating multi-source data from Sentinel-1, PIESAT-1, Capella Space, and Google Earth. Comprehensive benchmarks across four vision tasks demonstrate significant performance gains, with state-of-the-art results in multi-modal matching, thereby filling a critical gap in high-precision, large-scale multi-modal remote sensing datasets.

Technology Category

Application Category

📝 Abstract

Synthetic Aperture Radar (SAR) and optical imagery provide complementary strengths that constitute the critical foundation for transcending single-modality constraints and facilitating cross-modal collaborative processing and intelligent interpretation. However, existing benchmark datasets often suffer from limitations such as single spatial resolution, insufficient data scale, and low alignment accuracy, making them inadequate for supporting the training and generalization of multi-scale foundation models. To address these challenges, we introduce SOMA-1M (SAR-Optical Multi-resolution Alignment), a pixel-level precisely aligned dataset containing over 1.3 million pairs of georeferenced images with a specification of 512 x 512 pixels. This dataset integrates imagery from Sentinel-1, PIESAT-1, Capella Space, and Google Earth, achieving global multi-scale coverage from 0.5 m to 10 m. It encompasses 12 typical land cover categories, effectively ensuring scene diversity and complexity. To address multimodal projection deformation and massive data registration, we designed a rigorous coarse-to-fine image matching framework ensuring pixel-level alignment. Based on this dataset, we established comprehensive evaluation benchmarks for four hierarchical vision tasks, including image matching, image fusion, SAR-assisted cloud removal, and cross-modal translation, involving over 30 mainstream algorithms. Experimental results demonstrate that supervised training on SOMA-1M significantly enhances performance across all tasks. Notably, multimodal remote sensing image (MRSI) matching performance achieves current state-of-the-art (SOTA) levels. SOMA-1M serves as a foundational resource for robust multimodal algorithms and remote sensing foundation models. The dataset will be released publicly at: https://github.com/PeihaoWu/SOMA-1M.

Problem

Research questions and friction points this paper is trying to address.

SAR-optical alignment

multi-resolution remote sensing

large-scale dataset

pixel-level alignment

multimodal remote sensing

Innovation

Methods, ideas, or system contributions that make the work stand out.

SAR-optical alignment

multi-resolution remote sensing

pixel-level registration