🤖 AI Summary
This paper addresses the pervasive data incompleteness in remote sensing imagery—caused by cloud cover, occlusion, and sensor failures—by systematically surveying mask image modeling (MIM) for self-supervised pretraining in this domain. It establishes the first comprehensive taxonomy of remote sensing MIM methodologies, clarifying masking strategies (pixel-, patch-, and feature-level), architectural evolution (Transformer- vs. CNN-based), multi-source data fusion techniques, and downstream task adaptation (e.g., cloud removal, super-resolution). Synthesizing over 100 state-of-the-art works, it identifies key performance bottlenecks and proposes a unified evaluation protocol. The study further outlines three critical future directions: scalable pretraining, cross-modal alignment, and physics-informed modeling. Collectively, this work formalizes the first systematic research framework for MIM in remote sensing, bridging methodological rigor with practical applicability.
📝 Abstract
Masked Image Modeling (MIM) is a self-supervised learning technique that involves masking portions of an image, such as pixels, patches, or latent representations, and training models to predict the missing information using the visible context. This approach has emerged as a cornerstone in self-supervised learning, unlocking new possibilities in visual understanding by leveraging unannotated data for pre-training. In remote sensing, MIM addresses challenges such as incomplete data caused by cloud cover, occlusions, and sensor limitations, enabling applications like cloud removal, multi-modal data fusion, and super-resolution. By synthesizing and critically analyzing recent advancements, this survey (MIMRS) is a pioneering effort to chart the landscape of mask image modeling in remote sensing. We highlight state-of-the-art methodologies, applications, and future research directions, providing a foundational review to guide innovation in this rapidly evolving field.