🤖 AI Summary
To address the degradation of semantic segmentation model generalization under visual condition shifts (e.g., day/night, weather), this paper proposes a feature-level domain adaptation method grounded in feature invariance. Leveraging image style transfer as an intermediary, the approach aligns feature representations between stylized and original images at the encoder level, thereby disentangling style variation from semantic structure and enabling condition-agnostic semantic understanding. Its core innovation is the first feature-level invariance loss function explicitly designed for semantic segmentation. Built upon a state-of-the-art unsupervised domain adaptation framework, the method requires no target-domain annotations. Experiments demonstrate that it achieves state-of-the-art performance on Cityscapes→Dark Zurich and ranks second on Cityscapes→ACDC. Moreover, it exhibits strong zero-shot transfer capability to unseen domains—including BDD100K-night and ACDC-night—without fine-tuning.
📝 Abstract
Adaptation of semantic segmentation networks to different visual conditions is vital for robust perception in autonomous cars and robots. However, previous work has shown that most feature-level adaptation methods, which employ adversarial training and are validated on synthetic-to-real adaptation, provide marginal gains in condition-level adaptation, being outperformed by simple pixel-level adaptation via stylization. Motivated by these findings, we propose to leverage stylization in performing feature-level adaptation by aligning the internal network features extracted by the encoder of the network from the original and the stylized view of each input image with a novel feature invariance loss. In this way, we encourage the encoder to extract features that are already invariant to the style of the input, allowing the decoder to focus on parsing these features and not on further abstracting from the specific style of the input. We implement our method, named Condition-Invariant Semantic Segmentation (CISS), on the current state-of-the-art domain adaptation architecture and achieve outstanding results on condition-level adaptation. In particular, CISS sets the new state of the art in the popular daytime-to-nighttime Cityscapes$ o$Dark Zurich benchmark. Furthermore, our method achieves the second-best performance on the normal-to-adverse Cityscapes$ o$ACDC benchmark. CISS is shown to generalize well to domains unseen during training, such as BDD100K-night and ACDC-night. Code is publicly available at https://github.com/SysCV/CISS .