🤖 AI Summary
Lightweight models suffer from low pseudo-label quality and significant performance degradation in unsupervised domain adaptation (UDA) semantic segmentation due to architectural rigidity. To address this, we propose EMA-KD, a synergistic framework integrating exponential moving average (EMA)-based self-training with knowledge distillation. Specifically, it introduces progressive large-to-small network distillation, an inconsistency-weighted loss to emphasize hard-to-adapt classes, and a multi-teacher ensemble for robust pseudo-label generation. By jointly leveraging EMA self-training and distillation, the framework enhances the robustness and generalization of lightweight models under domain shift. Evaluated on four standard UDA benchmarks, our approach achieves state-of-the-art performance using only lightweight architectures—outperforming several mainstream heavyweight models across multiple metrics. This work is the first to empirically demonstrate the strong competitiveness of lightweight models in UDA semantic segmentation.
📝 Abstract
Unsupervised Domain Adaptation (UDA) is essential for enabling semantic segmentation in new domains without requiring costly pixel-wise annotations. State-of-the-art (SOTA) UDA methods primarily use self-training with architecturally identical teacher and student networks, relying on Exponential Moving Average (EMA) updates. However, these approaches face substantial performance degradation with lightweight models due to inherent architectural inflexibility leading to low-quality pseudo-labels. To address this, we propose Distilled Unsupervised Domain Adaptation (DUDA), a novel framework that combines EMA-based self-training with knowledge distillation (KD). Our method employs an auxiliary student network to bridge the architectural gap between heavyweight and lightweight models for EMA-based updates, resulting in improved pseudo-label quality. DUDA employs a strategic fusion of UDA and KD, incorporating innovative elements such as gradual distillation from large to small networks, inconsistency loss prioritizing poorly adapted classes, and learning with multiple teachers. Extensive experiments across four UDA benchmarks demonstrate DUDA's superiority in achieving SOTA performance with lightweight models, often surpassing the performance of heavyweight models from other approaches.