DeepAndes: A Self-Supervised Vision Foundation Model for Multi-Spectral Remote Sensing Imagery of the Andes

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Archaeological fine-grained feature annotation is scarce for multispectral remote sensing imagery (8-band) in the Andean region, severely limiting the generalization capability of supervised models. Method: This paper introduces the first region-specific multispectral vision foundation model for the Andes. We adapt the DINOv2 self-supervised framework to 8-band inputs, overcoming the transfer bottleneck of RGB-pretrained models in remote sensing. Built upon a Transformer architecture, the model integrates modules for imbalanced classification, instance retrieval, and pixel-level segmentation evaluation. Contribution/Results: Under few-shot settings, our model significantly outperforms both from-scratch training and small-scale pretrained baselines, achieving superior performance in F1-score, mean Average Precision (mAP), and Dice coefficient. These results empirically validate the effectiveness and necessity of large-scale multispectral self-supervised pretraining for archaeological remote sensing interpretation.

Technology Category

Application Category

📝 Abstract

By mapping sites at large scales using remotely sensed data, archaeologists can generate unique insights into long-term demographic trends, inter-regional social networks, and past adaptations to climate change. Remote sensing surveys complement field-based approaches, and their reach can be especially great when combined with deep learning and computer vision techniques. However, conventional supervised deep learning methods face challenges in annotating fine-grained archaeological features at scale. While recent vision foundation models have shown remarkable success in learning large-scale remote sensing data with minimal annotations, most off-the-shelf solutions are designed for RGB images rather than multi-spectral satellite imagery, such as the 8-band data used in our study. In this paper, we introduce DeepAndes, a transformer-based vision foundation model trained on three million multi-spectral satellite images, specifically tailored for Andean archaeology. DeepAndes incorporates a customized DINOv2 self-supervised learning algorithm optimized for 8-band multi-spectral imagery, marking the first foundation model designed explicitly for the Andes region. We evaluate its image understanding performance through imbalanced image classification, image instance retrieval, and pixel-level semantic segmentation tasks. Our experiments show that DeepAndes achieves superior F1 scores, mean average precision, and Dice scores in few-shot learning scenarios, significantly outperforming models trained from scratch or pre-trained on smaller datasets. This underscores the effectiveness of large-scale self-supervised pre-training in archaeological remote sensing. Codes will be available on https://github.com/geopacha/DeepAndes.

Problem

Research questions and friction points this paper is trying to address.

Challenges in annotating fine-grained archaeological features at scale

Lack of vision foundation models for multi-spectral satellite imagery

Need for region-specific self-supervised learning in Andean archaeology

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based model for multi-spectral imagery

Customized DINOv2 self-supervised learning algorithm

Optimized for 8-band Andean archaeology data

🔎 Similar Papers

No similar papers found.