DeepAndes: A Self-Supervised Vision Foundation Model for Multi-Spectral Remote Sensing Imagery of the Andes

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Archaeological fine-grained feature annotation is scarce for multispectral remote sensing imagery (8-band) in the Andean region, severely limiting the generalization capability of supervised models. Method: This paper introduces the first region-specific multispectral vision foundation model for the Andes. We adapt the DINOv2 self-supervised framework to 8-band inputs, overcoming the transfer bottleneck of RGB-pretrained models in remote sensing. Built upon a Transformer architecture, the model integrates modules for imbalanced classification, instance retrieval, and pixel-level segmentation evaluation. Contribution/Results: Under few-shot settings, our model significantly outperforms both from-scratch training and small-scale pretrained baselines, achieving superior performance in F1-score, mean Average Precision (mAP), and Dice coefficient. These results empirically validate the effectiveness and necessity of large-scale multispectral self-supervised pretraining for archaeological remote sensing interpretation.

Technology Category

Application Category

📝 Abstract
By mapping sites at large scales using remotely sensed data, archaeologists can generate unique insights into long-term demographic trends, inter-regional social networks, and past adaptations to climate change. Remote sensing surveys complement field-based approaches, and their reach can be especially great when combined with deep learning and computer vision techniques. However, conventional supervised deep learning methods face challenges in annotating fine-grained archaeological features at scale. While recent vision foundation models have shown remarkable success in learning large-scale remote sensing data with minimal annotations, most off-the-shelf solutions are designed for RGB images rather than multi-spectral satellite imagery, such as the 8-band data used in our study. In this paper, we introduce DeepAndes, a transformer-based vision foundation model trained on three million multi-spectral satellite images, specifically tailored for Andean archaeology. DeepAndes incorporates a customized DINOv2 self-supervised learning algorithm optimized for 8-band multi-spectral imagery, marking the first foundation model designed explicitly for the Andes region. We evaluate its image understanding performance through imbalanced image classification, image instance retrieval, and pixel-level semantic segmentation tasks. Our experiments show that DeepAndes achieves superior F1 scores, mean average precision, and Dice scores in few-shot learning scenarios, significantly outperforming models trained from scratch or pre-trained on smaller datasets. This underscores the effectiveness of large-scale self-supervised pre-training in archaeological remote sensing. Codes will be available on https://github.com/geopacha/DeepAndes.
Problem

Research questions and friction points this paper is trying to address.

Challenges in annotating fine-grained archaeological features at scale
Lack of vision foundation models for multi-spectral satellite imagery
Need for region-specific self-supervised learning in Andean archaeology
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based model for multi-spectral imagery
Customized DINOv2 self-supervised learning algorithm
Optimized for 8-band Andean archaeology data
🔎 Similar Papers
No similar papers found.
Junlin Guo
Junlin Guo
Vanderbilt University
Deep LearningFoundation ModelsMedical Image AnalysisRemote Sensing
James Zimmer-Dauphinee
James Zimmer-Dauphinee
Vanderbilt University
ArchaeologyRemote SensingGeophysics
J
Jordan M. Nieusma
Data Science Institute, Vanderbilt University, Nashville, TN, USA
Siqi Lu
Siqi Lu
College of William and Mary
computer visionmachine learningmedical imaging
Q
Quan Liu
Department of Computer Science, Vanderbilt University, Nashville, TN, USA
Ruining Deng
Ruining Deng
Weill Cornell Medicine
Medical Image AnalysisDeep LearningDigital Pathology
C
Can Cui
Department of Computer Science, Vanderbilt University, Nashville, TN, USA
J
Jialin Yue
Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, USA
Y
Yizhe Lin
Department of Mathematics, Vanderbilt University, Nashville, TN, USA
Tianyuan Yao
Tianyuan Yao
Vanderbilt University
Machine Learningmedical image processing
Juming Xiong
Juming Xiong
Vanderbilt University
deep learningcomputer visionmedical image processing
Junchao Zhu
Junchao Zhu
Vanderbilt University
Chongyu Qu
Chongyu Qu
Vanderbilt University
Computer VisionDeep LearningMedical Image Analysis
Yuechen Yang
Yuechen Yang
Vanderbilt University
Medical Image Analysis
M
Mitchell Wilkes
Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, USA
X
Xiao Wang
Oak Ridge National Laboratory, Oak Ridge, TN, USA
P
P. VanValkenburgh
Department of Anthropology, Brown University, Providence, RI, USA
Steven A. Wernke
Steven A. Wernke
Professor of Anthropology, Vanderbilt University
ArchaeologyEthnohistoryAndesColonial PeriodSpatial Analysis
Yuankai Huo
Yuankai Huo
Computer Science, Vanderbilt University
Medical Image AnalysisDeep LearningData Mining