SSL4SAR: Self-Supervised Learning for Glacier Calving Front Extraction from SAR Imagery

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

To address the significant domain shift between synthetic aperture radar (SAR) remote sensing imagery and ImageNet natural images, this work proposes two self-supervised multimodal pretraining strategies and introduces a novel hybrid architecture integrating a Swin Transformer encoder with a residual CNN decoder, specifically designed for precise glacier calving front localization. The method eliminates reliance on ImageNet-supervised pretraining, enabling year-round, uninterrupted SAR monitoring. On the CaFFe benchmark, the single model achieves a mean distance error of 293 m—improving upon the state-of-the-art by 67 m—while ensemble inference further reduces the error to 75 m, approaching human annotation accuracy. This is the first study to systematically integrate self-supervised multimodal pretraining with a Swin-CNN hybrid architecture for glacier front extraction, effectively mitigating domain shift bottlenecks and establishing a new paradigm for intelligent interpretation of polar remote sensing data.

Technology Category

Application Category

📝 Abstract

Glaciers are losing ice mass at unprecedented rates, increasing the need for accurate, year-round monitoring to understand frontal ablation, particularly the factors driving the calving process. Deep learning models can extract calving front positions from Synthetic Aperture Radar imagery to track seasonal ice losses at the calving fronts of marine- and lake-terminating glaciers. The current state-of-the-art model relies on ImageNet-pretrained weights. However, they are suboptimal due to the domain shift between the natural images in ImageNet and the specialized characteristics of remote sensing imagery, in particular for Synthetic Aperture Radar imagery. To address this challenge, we propose two novel self-supervised multimodal pretraining techniques that leverage SSL4SAR, a new unlabeled dataset comprising 9,563 Sentinel-1 and 14 Sentinel-2 images of Arctic glaciers, with one optical image per glacier in the dataset. Additionally, we introduce a novel hybrid model architecture that combines a Swin Transformer encoder with a residual Convolutional Neural Network (CNN) decoder. When pretrained on SSL4SAR, this model achieves a mean distance error of 293 m on the "CAlving Fronts and where to Find thEm" (CaFFe) benchmark dataset, outperforming the prior best model by 67 m. Evaluating an ensemble of the proposed model on a multi-annotator study of the benchmark dataset reveals a mean distance error of 75 m, approaching the human performance of 38 m. This advancement enables precise monitoring of seasonal changes in glacier calving fronts.

Problem

Research questions and friction points this paper is trying to address.

Extracting glacier calving fronts from SAR imagery accurately

Overcoming domain shift in pretrained models for remote sensing

Improving monitoring of seasonal glacier changes via deep learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised multimodal pretraining techniques

Hybrid Swin Transformer-CNN architecture

SSL4SAR unlabeled dataset utilization

🔎 Similar Papers

Globally scalable glacier mapping by deep learning matches expert delineation accuracy