SOMA-1M: A Large-Scale SAR-Optical Multi-resolution Alignment Dataset for Multi-Task Remote Sensing

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing remote sensing datasets—such as single-resolution coverage, insufficient scale, and low alignment accuracy—that hinder the development of multi-scale, multi-modal foundation models. We present the first large-scale dataset comprising over 1.3 million pixel-level precisely aligned SAR and optical image pairs, spanning resolutions from 0.5 m to 10 m and covering 12 representative land-cover classes. A coarse-to-fine matching framework is introduced to effectively resolve challenges posed by multi-modal projection distortions and massive-scale registration, integrating multi-source data from Sentinel-1, PIESAT-1, Capella Space, and Google Earth. Comprehensive benchmarks across four vision tasks demonstrate significant performance gains, with state-of-the-art results in multi-modal matching, thereby filling a critical gap in high-precision, large-scale multi-modal remote sensing datasets.

Technology Category

Application Category

📝 Abstract
Synthetic Aperture Radar (SAR) and optical imagery provide complementary strengths that constitute the critical foundation for transcending single-modality constraints and facilitating cross-modal collaborative processing and intelligent interpretation. However, existing benchmark datasets often suffer from limitations such as single spatial resolution, insufficient data scale, and low alignment accuracy, making them inadequate for supporting the training and generalization of multi-scale foundation models. To address these challenges, we introduce SOMA-1M (SAR-Optical Multi-resolution Alignment), a pixel-level precisely aligned dataset containing over 1.3 million pairs of georeferenced images with a specification of 512 x 512 pixels. This dataset integrates imagery from Sentinel-1, PIESAT-1, Capella Space, and Google Earth, achieving global multi-scale coverage from 0.5 m to 10 m. It encompasses 12 typical land cover categories, effectively ensuring scene diversity and complexity. To address multimodal projection deformation and massive data registration, we designed a rigorous coarse-to-fine image matching framework ensuring pixel-level alignment. Based on this dataset, we established comprehensive evaluation benchmarks for four hierarchical vision tasks, including image matching, image fusion, SAR-assisted cloud removal, and cross-modal translation, involving over 30 mainstream algorithms. Experimental results demonstrate that supervised training on SOMA-1M significantly enhances performance across all tasks. Notably, multimodal remote sensing image (MRSI) matching performance achieves current state-of-the-art (SOTA) levels. SOMA-1M serves as a foundational resource for robust multimodal algorithms and remote sensing foundation models. The dataset will be released publicly at: https://github.com/PeihaoWu/SOMA-1M.
Problem

Research questions and friction points this paper is trying to address.

SAR-optical alignment
multi-resolution remote sensing
large-scale dataset
pixel-level alignment
multimodal remote sensing
Innovation

Methods, ideas, or system contributions that make the work stand out.

SAR-optical alignment
multi-resolution remote sensing
pixel-level registration
multimodal foundation model
large-scale dataset
🔎 Similar Papers
P
Peihao Wu
School of Remote Sensing Information Engineering, Wuhan University, Wuhan, 430079, China
Y
Yongxiang Yao
School of Remote Sensing Information Engineering, Wuhan University, Wuhan, 430079, China; Hubei LuoJia Laboratory, Wuhan, 430079, China; Technology Innovation Center for Collaborative Applications of Natural Resources Data in GBA, Ministry of Natural Resources, Guangzhou, 510075, China
Y
Y. Wan
School of Remote Sensing Information Engineering, Wuhan University, Wuhan, 430079, China; Hubei LuoJia Laboratory, Wuhan, 430079, China; Technology Innovation Center for Collaborative Applications of Natural Resources Data in GBA, Ministry of Natural Resources, Guangzhou, 510075, China
W
Wenfei Zhang
School of Remote Sensing Information Engineering, Wuhan University, Wuhan, 430079, China
R
Ruipeng Zhao
School of Remote Sensing Information Engineering, Wuhan University, Wuhan, 430079, China
Jiayuan Li
Jiayuan Li
wuhan uniersity
remote sensing, image processing, computer vision
Yongjun Zhang
Yongjun Zhang
Wuhan University
PhotogrammetryRemote SensingComputer Vision