🤖 AI Summary
To address three key challenges in unsupervised domain adaptation (UDA) for semantic segmentation—cross-domain contextual ambiguity, feature representation inconsistency, and category-level pseudo-label noise—this paper proposes a cross-hierarchical mask unification framework. Our method introduces a novel three-tier collaborative masking mechanism: (1) a context-aware mask modeling cross-domain semantic correlations; (2) a feature distillation mask aligning source and target feature representations; and (3) a class-decoupled mask suppressing pseudo-label noise. Hierarchical masks are instantiated via adaptive foreground/background discrimination, knowledge distillation from pretrained models, and class-wise uncertainty modeling, ensuring compatibility with mainstream UDA approaches. Extensive experiments demonstrate state-of-the-art performance: the framework achieves an average mIoU gain of 7.0% on two standard benchmarks—SYNTHIA→Cityscapes and GTA5→Cityscapes—surpassing prior methods.
📝 Abstract
Unsupervised domain adaptation (UDA) enables semantic segmentation models to generalize from a labeled source domain to an unlabeled target domain. However, existing UDA methods still struggle to bridge the domain gap due to cross-domain contextual ambiguity, inconsistent feature representations, and class-wise pseudo-label noise. To address these challenges, we propose Omni-level Masking for Unsupervised Domain Adaptation (OMUDA), a unified framework that introduces hierarchical masking strategies across distinct representation levels. Specifically, OMUDA comprises: 1) a Context-Aware Masking (CAM) strategy that adaptively distinguishes foreground from background to balance global context and local details; 2) a Feature Distillation Masking (FDM) strategy that enhances robust and consistent feature learning through knowledge transfer from pre-trained models; and 3) a Class Decoupling Masking (CDM) strategy that mitigates the impact of noisy pseudo-labels by explicitly modeling class-wise uncertainty. This hierarchical masking paradigm effectively reduces the domain shift at the contextual, representational, and categorical levels, providing a unified solution beyond existing approaches. Extensive experiments on multiple challenging cross-domain semantic segmentation benchmarks validate the effectiveness of OMUDA. Notably, on the SYNTHIA->Cityscapes and GTA5->Cityscapes tasks, OMUDA can be seamlessly integrated into existing UDA methods and consistently achieving state-of-the-art results with an average improvement of 7%.