🤖 AI Summary
This work addresses the vulnerability of existing self-supervised learning methods to severe degradations—such as haze, blur, or occlusion—in aerial imagery, which often introduce spurious representations by enforcing alignment of corrupted augmented views and thereby compromise model robustness. To mitigate this, the authors propose an uncertainty-aware self-supervised learning framework that dynamically modulates the contrastive loss via sample-level trust weights formulated in an additive residual form, avoiding performance degradation caused by multiplicative gating mechanisms. The approach integrates Dempster–Shafer evidence theory with a stop-gradient operator to better capture epistemic uncertainty. Evaluated on EuroSAT, AID, and NWPU-RESISC45, the method achieves an average linear probing accuracy of 90.20%, outperforms SimCLR by 19.9 percentage points under heavy haze, and improves zero-shot cross-domain AUROC on BDD100K by 1–3 points.
📝 Abstract
Self-supervised learning (SSL) is a standard approach for representation learning in aerial imagery. Existing methods enforce invariance between augmented views, which works well when augmentations preserve semantic content. However, aerial images are frequently degraded by haze, motion blur, rain, and occlusion that remove critical evidence. Enforcing alignment between a clean and a severely degraded view can introduce spurious structure into the latent space. This study proposes a training strategy and architectural modification to enhance SSL robustness to such corruptions. It introduces a per-sample, per-factor trust weight into the alignment objective, combined with the base contrastive loss as an additive residual. A stop-gradient is applied to the trust weight instead of a multiplicative gate. While a multiplicative gate is a natural choice, experiments show it impairs the backbone, whereas our additive-residual approach improves it. Using a 200-epoch protocol on a 210,000-image corpus, the method achieves the highest mean linear-probe accuracy among six backbones on EuroSAT, AID, and NWPU-RESISC45 (90.20% compared to 88.46% for SimCLR and 89.82% for VICReg). It yields the largest improvements under severe information-erasing corruptions on EuroSAT (+19.9 points on haze at s=5 over SimCLR). The method also demonstrates consistent gains of +1 to +3 points in Mahalanobis AUROC on a zero-shot cross-domain stress test using BDD100K weather splits. Two ablations (scalar uncertainty and cosine gate) indicate the additive-residual formulation is the primary source of these improvements. An evidential variant using Dempster-Shafer fusion introduces interpretable signals of conflict and ignorance. These findings offer a concrete design principle for uncertainty-aware SSL. Code is publicly available at https://github.com/WadiiBoulila/trust-ssl.