🤖 AI Summary
Archaeological fine-grained feature annotation is scarce for multispectral remote sensing imagery (8-band) in the Andean region, severely limiting the generalization capability of supervised models. Method: This paper introduces the first region-specific multispectral vision foundation model for the Andes. We adapt the DINOv2 self-supervised framework to 8-band inputs, overcoming the transfer bottleneck of RGB-pretrained models in remote sensing. Built upon a Transformer architecture, the model integrates modules for imbalanced classification, instance retrieval, and pixel-level segmentation evaluation. Contribution/Results: Under few-shot settings, our model significantly outperforms both from-scratch training and small-scale pretrained baselines, achieving superior performance in F1-score, mean Average Precision (mAP), and Dice coefficient. These results empirically validate the effectiveness and necessity of large-scale multispectral self-supervised pretraining for archaeological remote sensing interpretation.
📝 Abstract
By mapping sites at large scales using remotely sensed data, archaeologists can generate unique insights into long-term demographic trends, inter-regional social networks, and past adaptations to climate change. Remote sensing surveys complement field-based approaches, and their reach can be especially great when combined with deep learning and computer vision techniques. However, conventional supervised deep learning methods face challenges in annotating fine-grained archaeological features at scale. While recent vision foundation models have shown remarkable success in learning large-scale remote sensing data with minimal annotations, most off-the-shelf solutions are designed for RGB images rather than multi-spectral satellite imagery, such as the 8-band data used in our study. In this paper, we introduce DeepAndes, a transformer-based vision foundation model trained on three million multi-spectral satellite images, specifically tailored for Andean archaeology. DeepAndes incorporates a customized DINOv2 self-supervised learning algorithm optimized for 8-band multi-spectral imagery, marking the first foundation model designed explicitly for the Andes region. We evaluate its image understanding performance through imbalanced image classification, image instance retrieval, and pixel-level semantic segmentation tasks. Our experiments show that DeepAndes achieves superior F1 scores, mean average precision, and Dice scores in few-shot learning scenarios, significantly outperforming models trained from scratch or pre-trained on smaller datasets. This underscores the effectiveness of large-scale self-supervised pre-training in archaeological remote sensing. Codes will be available on https://github.com/geopacha/DeepAndes.