A generalised pre-training strategy for deep learning networks in semantic segmentation of remotely sensed images

📅 2026-04-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

180K/year
🤖 AI Summary
This work addresses the limited transferability of ImageNet-pretrained models in remote sensing image semantic segmentation due to significant domain gaps, as well as the high cost and poor generalization of domain-specific pretraining datasets. To bridge this gap, the authors propose a general-purpose pretraining strategy that suppresses the learning of source-domain-specific features, thereby effectively reducing inter-domain discrepancies. Building upon a modified ImageNet pretraining framework integrated with deep neural networks, the method achieves state-of-the-art performance across four benchmark remote sensing segmentation datasets—iSAID, MFNet, PST900, and Potsdam—after fine-tuning, with mIoU/mF1 scores of 67.4%, 56.9%, 84.22%, and 91.88%, respectively. This approach offers a promising pathway toward unified foundation models for both remote sensing and general computer vision tasks.
📝 Abstract
In the segmentation of remotely sensed images, deep learning models are typically pre-trained using large image databases like ImageNet before fine-tuned on domain-specific datasets. However, the performance of these fine-tuned models is often hindered by the large domain gaps (i.e., differences in scenes and modalities) between ImageNet's images and remotely sensed images being processed. Therefore, many researchers have undertaken efforts to establish large-scale domain-specific image datasets for pre-training, aiming to enhance model performance. However, establishing such datasets is often challenging, requiring significant effort, and these datasets often exhibit limited generaliza-bility to other application scenarios. To address these issues, this study introduces a novel yet simple pre-training strategy designed to guide a model away from learning domain-specific features in a pre-training dataset during pre-training, thereby improving the generalisation ability of the pre-trained model. To evaluate the strategy's effectiveness, deep learning models are pre-trained on ImageNet and subsequently fine-tuned on four semantic segmentation datasets with diverse scenes and modalities, including iSAID, MFNet, PST900 and Potsdam. Experimental results show that the proposed pre-training strategy led to state-of-the-art accuracies on all four datasets, namely 67.4% mIoU for iSAID, 56.9% mIoU for MFNet, 84.22% mIoU for PST900, 91.88% mF1 for Potsdam. This research lays the groundwork for developing a unified foundation model applicable to both computer vision and remote sensing applications.
Problem

Research questions and friction points this paper is trying to address.

domain gap
semantic segmentation
remote sensing
pre-training
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

domain generalization
pre-training strategy
semantic segmentation
remote sensing
foundation model
🔎 Similar Papers