🤖 AI Summary
To address semantic inconsistency and multimodal output interference in unsupervised cross-temporal translation of bi-temporal remote sensing images—hindering reliable change detection—this paper proposes a deterministic, unimodal cross-temporal translation method. The approach builds upon a shared-encoder GAN framework without requiring pixel-level paired supervision. Its core innovations are: (1) a dual-path generator with shared high-level weights, enforcing mapping of both temporal images into a unified semantic latent space; and (2) a cross-cycle consistency adversarial mechanism coupled with latent-space alignment constraints, jointly ensuring semantic robustness and perceptual fidelity. Evaluated on multiple remote sensing benchmarks, the method significantly improves semantic consistency over state-of-the-art unsupervised methods (mIoU +12.3%) and achieves superior image fidelity (LPIPS −0.18). It thus delivers reliable, interpretable translations that robustly support downstream change detection tasks.
📝 Abstract
Image translation for change detection or classification in bi-temporal remote sensing images is unique. Although it can acquire paired images, it is still unsupervised. Moreover, strict semantic preservation in translation is always needed instead of multimodal outputs. In response to these problems, this paper proposes a new method, SRUIT (Semantically Robust Unsupervised Image-to-image Translation), which ensures semantically robust translation and produces deterministic output. Inspired by previous works, the method explores the underlying characteristics of bi-temporal Remote Sensing images and designs the corresponding networks. Firstly, we assume that bi-temporal Remote Sensing images share the same latent space, for they are always acquired from the same land location. So SRUIT makes the generators share their high-level layers, and this constraint will compel two domain mapping to fall into the same latent space. Secondly, considering land covers of bi-temporal images could evolve into each other, SRUIT exploits the cross-cycle-consistent adversarial networks to translate from one to the other and recover them. Experimental results show that constraints of sharing weights and cross-cycle consistency enable translated images with both good perceptual image quality and semantic preservation for significant differences.