SSL4SAR: Self-Supervised Learning for Glacier Calving Front Extraction from SAR Imagery

📅 2025-07-02
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
To address the significant domain shift between synthetic aperture radar (SAR) remote sensing imagery and ImageNet natural images, this work proposes two self-supervised multimodal pretraining strategies and introduces a novel hybrid architecture integrating a Swin Transformer encoder with a residual CNN decoder, specifically designed for precise glacier calving front localization. The method eliminates reliance on ImageNet-supervised pretraining, enabling year-round, uninterrupted SAR monitoring. On the CaFFe benchmark, the single model achieves a mean distance error of 293 m—improving upon the state-of-the-art by 67 m—while ensemble inference further reduces the error to 75 m, approaching human annotation accuracy. This is the first study to systematically integrate self-supervised multimodal pretraining with a Swin-CNN hybrid architecture for glacier front extraction, effectively mitigating domain shift bottlenecks and establishing a new paradigm for intelligent interpretation of polar remote sensing data.

Technology Category

Application Category

📝 Abstract
Glaciers are losing ice mass at unprecedented rates, increasing the need for accurate, year-round monitoring to understand frontal ablation, particularly the factors driving the calving process. Deep learning models can extract calving front positions from Synthetic Aperture Radar imagery to track seasonal ice losses at the calving fronts of marine- and lake-terminating glaciers. The current state-of-the-art model relies on ImageNet-pretrained weights. However, they are suboptimal due to the domain shift between the natural images in ImageNet and the specialized characteristics of remote sensing imagery, in particular for Synthetic Aperture Radar imagery. To address this challenge, we propose two novel self-supervised multimodal pretraining techniques that leverage SSL4SAR, a new unlabeled dataset comprising 9,563 Sentinel-1 and 14 Sentinel-2 images of Arctic glaciers, with one optical image per glacier in the dataset. Additionally, we introduce a novel hybrid model architecture that combines a Swin Transformer encoder with a residual Convolutional Neural Network (CNN) decoder. When pretrained on SSL4SAR, this model achieves a mean distance error of 293 m on the "CAlving Fronts and where to Find thEm" (CaFFe) benchmark dataset, outperforming the prior best model by 67 m. Evaluating an ensemble of the proposed model on a multi-annotator study of the benchmark dataset reveals a mean distance error of 75 m, approaching the human performance of 38 m. This advancement enables precise monitoring of seasonal changes in glacier calving fronts.
Problem

Research questions and friction points this paper is trying to address.

Extracting glacier calving fronts from SAR imagery accurately
Overcoming domain shift in pretrained models for remote sensing
Improving monitoring of seasonal glacier changes via deep learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised multimodal pretraining techniques
Hybrid Swin Transformer-CNN architecture
SSL4SAR unlabeled dataset utilization
🔎 Similar Papers
No similar papers found.
Nora Gourmelon
Nora Gourmelon
Friedrich-Alexander-Universität
Deep LearningClimate ChangeSustainabilityMachine Learning
M
Marcel Dreier
Pattern Recognition Lab, Computer Science Department, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
M
Martin Mayr
Erlangen National High Performance Computing Center (NHR@FAU), Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
Thorsten Seehaus
Thorsten Seehaus
Unknown affiliation
D
Dakota Pyles
Institut fßr Geographie, Department of Geography and Geosciences, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
M
Matthias Braun
Institut fßr Geographie, Department of Geography and Geosciences, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
A
Andreas Maier
Pattern Recognition Lab, Computer Science Department, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
Vincent Christlein
Vincent Christlein
University Erlangen-Nuremberg
Computer VisionDocument AnalysisArt AnalysisComputational HumanitiesAI4Conservation