OMUDA: Omni-level Masking for Unsupervised Domain Adaptation in Semantic Segmentation

📅 2025-12-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three key challenges in unsupervised domain adaptation (UDA) for semantic segmentation—cross-domain contextual ambiguity, feature representation inconsistency, and category-level pseudo-label noise—this paper proposes a cross-hierarchical mask unification framework. Our method introduces a novel three-tier collaborative masking mechanism: (1) a context-aware mask modeling cross-domain semantic correlations; (2) a feature distillation mask aligning source and target feature representations; and (3) a class-decoupled mask suppressing pseudo-label noise. Hierarchical masks are instantiated via adaptive foreground/background discrimination, knowledge distillation from pretrained models, and class-wise uncertainty modeling, ensuring compatibility with mainstream UDA approaches. Extensive experiments demonstrate state-of-the-art performance: the framework achieves an average mIoU gain of 7.0% on two standard benchmarks—SYNTHIA→Cityscapes and GTA5→Cityscapes—surpassing prior methods.

Technology Category

Application Category

📝 Abstract
Unsupervised domain adaptation (UDA) enables semantic segmentation models to generalize from a labeled source domain to an unlabeled target domain. However, existing UDA methods still struggle to bridge the domain gap due to cross-domain contextual ambiguity, inconsistent feature representations, and class-wise pseudo-label noise. To address these challenges, we propose Omni-level Masking for Unsupervised Domain Adaptation (OMUDA), a unified framework that introduces hierarchical masking strategies across distinct representation levels. Specifically, OMUDA comprises: 1) a Context-Aware Masking (CAM) strategy that adaptively distinguishes foreground from background to balance global context and local details; 2) a Feature Distillation Masking (FDM) strategy that enhances robust and consistent feature learning through knowledge transfer from pre-trained models; and 3) a Class Decoupling Masking (CDM) strategy that mitigates the impact of noisy pseudo-labels by explicitly modeling class-wise uncertainty. This hierarchical masking paradigm effectively reduces the domain shift at the contextual, representational, and categorical levels, providing a unified solution beyond existing approaches. Extensive experiments on multiple challenging cross-domain semantic segmentation benchmarks validate the effectiveness of OMUDA. Notably, on the SYNTHIA->Cityscapes and GTA5->Cityscapes tasks, OMUDA can be seamlessly integrated into existing UDA methods and consistently achieving state-of-the-art results with an average improvement of 7%.
Problem

Research questions and friction points this paper is trying to address.

Addresses domain gap in semantic segmentation via unsupervised adaptation
Reduces contextual ambiguity and inconsistent feature representations across domains
Mitigates class-wise pseudo-label noise to improve segmentation accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical masking strategies across representation levels
Context-aware masking balances global context and local details
Class decoupling masking mitigates noisy pseudo-label impact
Y
Yang Ou
School of Mechanical Engineering, Chengdu University, Chengdu 610106, China
Xiongwei Zhao
Xiongwei Zhao
Ph.D Candidate, Harbin Institute of Technology
3D PerceptionWorld ModelLLMEmbodied AIAutonomous System
X
Xinye Yang
School of Computer Science and Informatics, Cardiff University, Cardiff CF24 4AG, United Kingdom
Y
Yihan Wang
School of Mechanical Engineering, Southwest Jiaotong University, Chengdu 610031, China
Yicheng Di
Yicheng Di
Jiangnan University
Distributed ComputingRecommender SystemFederated LearningMeta Learning
R
Rong Yuan
School of Mechanical Engineering, Chengdu University, Chengdu 610106, China
Xieyuanli Chen
Xieyuanli Chen
Associate Professor, NUDT, China
RoboticsSLAMLocalizationLiDAR PerceptionRobot Learning
X
Xu Zhu
School of Information Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China