🤖 AI Summary
To address the low segmentation accuracy of small objects (e.g., grass, clouds) in remote sensing imagery and the performance degradation of pre-trained Vision Transformers (ViTs) due to domain shift, this paper proposes an end-to-end joint learning framework integrating knowledge guidance and domain refinement. We innovatively design a Feature Alignment Module (FAM) and a Feature Modulation Module (FMM) that synergistically fuse CNN backbones with ViT encoders. Cross-domain feature alignment is achieved via channel-wise transformation, spatial interpolation, KL-divergence regularization, and L2 normalization, while domain-adaptive modulation is introduced to enhance transferability. Furthermore, we construct the first fine-grained grass segmentation dataset. Experimental results demonstrate improvements of +2.57 and +3.73 in mIoU on grass segmentation and cloud detection benchmarks, respectively, effectively mitigating domain shift and alleviating label scarcity.
📝 Abstract
Fine-grained remote sensing image segmentation is essential for accurately identifying detailed objects in remote sensing images. Recently, vision transformer models (VTMs) pre-trained on large-scale datasets have demonstrated strong zero-shot generalization. However, directly applying them to specific tasks may lead to domain shift. We introduce a novel end-to-end learning paradigm combining knowledge guidance with domain refinement to enhance performance. We present two key components: the Feature Alignment Module (FAM) and the Feature Modulation Module (FMM). FAM aligns features from a CNN-based backbone with those from the pretrained VTM's encoder using channel transformation and spatial interpolation, and transfers knowledge via KL divergence and L2 normalization constraint. FMM further adapts the knowledge to the specific domain to address domain shift. We also introduce a fine-grained grass segmentation dataset and demonstrate, through experiments on two datasets, that our method achieves a significant improvement of 2.57 mIoU on the grass dataset and 3.73 mIoU on the cloud dataset. The results highlight the potential of combining knowledge transfer and domain adaptation to overcome domain-related challenges and data limitations. The project page is available at https://xavierjiezou.github.io/KTDA/.