🤖 AI Summary
This study addresses the challenge of precisely dating disturbance events at archaeological sites, which is hindered by subtle visual cues and sparse labels. Leveraging PlanetScope satellite imagery, the authors propose three complementary monthly change detection approaches: an unsupervised Temporal Embedding Distance (TED), a self-supervised change detection (SSCD) method, and a weakly supervised temporal localization model. For the first time, they integrate embeddings from six foundation models—including CLIP, GeoRSCLIP, and SatMAE—with handcrafted spectral-textural features and introduce a multi-strategy scoring mechanism. Evaluated on 1,943 sites in Afghanistan, the framework achieves a 55% exact-month recall when combining TED with SatMAE, while GeoRSCLIP and related models attain a 92.5% recall within a ±3-month window. Notably, SSCD demonstrates superior early-warning capability among the proposed methods.
📝 Abstract
Monitoring archaeological sites at scale is vital for protecting cultural heritage, yet pinpointing when disturbances occur remains difficult because visual cues are subtle and ground-truth data are sparse. We introduce WATCH, a framework for month-level change-event localization over PlanetScope satellite mosaics (2017-2024, 4.7 m/px) that supports three complementary scoring approaches: (i) Temporal Embedding Distance (TED), a training-free method that scores month-to-month deviations from a local temporal reference; (ii) Self-Supervised Change Detection (SSCD), an ensemble of reconstruction, forecasting, and latent-novelty signals; and (iii) a Weakly Supervised (WS) temporal localization model trained with sparse event-month labels. We benchmark WATCH on 1,943 archaeological sites in Afghanistan using embeddings from six foundation models (CLIP, GeoRSCLIP, SatMAE, Prithvi-EO-2.0, DINOv3, and Satlas-Pretrain) alongside a handcrafted spectral and texture baseline, and assess cross-regional generalization on sites in Syria, Turkey, Pakistan, and Egypt. The unsupervised approaches (TED, SSCD) consistently outperform the weakly supervised alternative. TED with SatMAE achieves the highest exact-month recall (55% at m=0), while TED with GeoRSCLIP, CLIP, or Satlas-Pretrain reaches 92.5% within a three-month tolerance (m=3). Handcrafted features remain competitive for exact-month detection under weak supervision. Our directional margin analysis reveals systematic temporal biases: SSCD paired with GeoRSCLIP or Prithvi-EO-2.0 exhibits the strongest early-warning profile, detecting anomalies before the recorded event, while TED favors confirmation-oriented detection after a change has materialized. These results show that satellite imagery combined with foundation-model embeddings enables scalable, decision-relevant heritage monitoring. Code: https://github.com/microsoft/WATCH