π€ AI Summary
This paper presents a systematic review of vision foundation models in remote sensing, addressing core challenges including multimodal data heterogeneity, severe label scarcity, and scene complexity. To tackle these issues, the authors proposeβ for the first timeβa dedicated taxonomy for remote sensing foundation models, structured along three dimensions: architectural design (e.g., Transformer-CNN hybrids), pretraining paradigms (e.g., contrastive learning, masked autoencoding), and data sources (e.g., multi-source imagery, cross-modal alignment). Synthesizing over 100 state-of-the-art studies, the work demonstrates that self-supervised pretraining substantially enhances model robustness. It further identifies three critical research frontiers: (1) construction of high-quality, domain-specific remote sensing datasets; (2) model lightweighting for efficient on-device deployment; and (3) improvement of cross-scene generalization capability. Collectively, this study delivers the first comprehensive technical roadmap to advance the practical deployment of AI in remote sensing.
π Abstract
Artificial Intelligence (AI) technologies have profoundly transformed the field of remote sensing, revolutionizing data collection, processing, and analysis. Traditionally reliant on manual interpretation and task-specific models, remote sensing research has been significantly enhanced by the advent of foundation models-large-scale, pre-trained AI models capable of performing a wide array of tasks with unprecedented accuracy and efficiency. This paper provides a comprehensive survey of foundation models in the remote sensing domain. We categorize these models based on their architectures, pre-training datasets, and methodologies. Through detailed performance comparisons, we highlight emerging trends and the significant advancements achieved by those foundation models. Additionally, we discuss technical challenges, practical implications, and future research directions, addressing the need for high-quality data, computational resources, and improved model generalization. Our research also finds that pre-training methods, particularly self-supervised learning techniques like contrastive learning and masked autoencoders, remarkably enhance the performance and robustness of foundation models. This survey aims to serve as a resource for researchers and practitioners by providing a panorama of advances and promising pathways for continued development and application of foundation models in remote sensing.