Subimage Overlap Prediction: Task-Aligned Self-Supervised Pretraining For Semantic Segmentation In Remote Sensing Imagery

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses the inefficiency of existing self-supervised pre-training methods for remote sensing image semantic segmentation under data-scarce conditions, where reliance on large volumes of unlabeled data limits transferability. To overcome this limitation, the authors propose a novel self-supervised pre-training task—sub-image overlap position prediction—that aligns closely with the downstream segmentation objective by learning semantic representations through predicting the relative positions of cropped sub-images within their original context. This approach substantially reduces the required amount of pre-training data while enhancing segmentation performance when labeled samples are scarce. Experimental results demonstrate that the proposed method consistently achieves comparable or superior mean Intersection over Union (mIoU) across multiple remote sensing benchmarks and mainstream network architectures, using less data and converging faster than conventional approaches.

Technology Category

Application Category

📝 Abstract

Self-supervised learning (SSL) methods have become a dominant paradigm for creating general purpose models whose capabilities can be transferred to downstream supervised learning tasks. However, most such methods rely on vast amounts of pretraining data. This work introduces Subimage Overlap Prediction, a novel self-supervised pretraining task to aid semantic segmentation in remote sensing imagery that uses significantly lesser pretraining imagery. Given an image, a sub-image is extracted and the model is trained to produce a semantic mask of the location of the extracted sub-image within the original image. We demonstrate that pretraining with this task results in significantly faster convergence, and equal or better performance (measured via mIoU) on downstream segmentation. This gap in convergence and performance widens when labeled training data is reduced. We show this across multiple architecture types, and with multiple downstream datasets. We also show that our method matches or exceeds performance while requiring significantly lesser pretraining data relative to other SSL methods. Code and model weights are provided at \href{https://github.com/sharmalakshay93/subimage-overlap-prediction}{github.com/sharmalakshay93/subimage-overlap-prediction}.

Problem

Research questions and friction points this paper is trying to address.

semantic segmentation

remote sensing imagery

self-supervised learning

pretraining data efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Subimage Overlap Prediction

Self-Supervised Learning

Semantic Segmentation

Remote Sensing Imagery