NeighborMAE: Exploiting Spatial Dependencies between Neighboring Earth Observation Images in Masked Autoencoders Pretraining

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the limitation of existing masked autoencoders in remote sensing image pretraining, which often neglect spatial dependencies between neighboring regions, thereby constraining representation learning performance. To overcome this, the study introduces spatial contextual information from adjacent remote sensing images into the masked autoencoder framework for the first time. The proposed approach enhances the model’s ability to capture complex land-cover structures through multi-image joint reconstruction, dynamic masking ratio adjustment, and a pixel-level loss weighting mechanism. Extensive experiments demonstrate that the method significantly outperforms current baselines across multiple remote sensing pretraining datasets and downstream tasks, validating the effectiveness and generalizability of leveraging spatial dependencies to improve self-supervised representation learning.

Technology Category

Application Category

📝 Abstract

Masked Image Modeling has been one of the most popular self-supervised learning paradigms to learn representations from large-scale, unlabeled Earth Observation images. While incorporating multi-modal and multi-temporal Earth Observation data into Masked Image Modeling has been widely explored, the spatial dependencies between images captured from neighboring areas remains largely overlooked. Since the Earth's surface is continuous, neighboring images are highly related and offer rich contextual information for self-supervised learning. To close this gap, we propose NeighborMAE, which learns spatial dependencies by joint reconstruction of neighboring Earth Observation images. To ensure that the reconstruction remains challenging, we leverage a heuristic strategy to dynamically adjust the mask ratio and the pixel-level loss weight. Experimental results across various pretraining datasets and downstream tasks show that NeighborMAE significantly outperforms existing baselines, underscoring the value of neighboring images in Masked Image Modeling for Earth Observation and the efficacy of our designs.

Problem

Research questions and friction points this paper is trying to address.

Masked Image Modeling

Earth Observation

Spatial Dependencies

Self-supervised Learning

Neighboring Images

Innovation

Methods, ideas, or system contributions that make the work stand out.

NeighborMAE

spatial dependencies

masked image modeling