🤖 AI Summary
This work addresses the challenges of source-free unsupervised domain adaptation for panoptic semantic segmentation, where geometric distortions, high annotation costs, and domain shift lead to unreliable pseudo-labels. To tackle these issues without accessing source-domain data, the authors propose the DAPASS framework, which incorporates a Pseudo-Label Construction with Geometric Distortion (PCGD) module to generate high-fidelity, class-balanced pseudo-labels, and a Contextual Relation Alignment Module (CRAM) to enable multi-scale adversarial alignment of contextual features, thereby mitigating distortion effects. Additionally, DAPASS integrates a denoising mechanism guided by perturbation consistency and neighborhood confidence, along with a high–low resolution context alignment strategy. The method achieves state-of-the-art performance on the Cityscapes-to-DensePASS and Stanford2D3D benchmarks, attaining mIoU scores of 55.04% and 70.38%, respectively.
📝 Abstract
Panoramic semantic segmentation is pivotal for comprehensive 360° scene understanding in critical applications like autonomous driving and virtual reality. However, progress in this domain is constrained by two key challenges: the severe geometric distortions inherent in panoramic projections and the prohibitive cost of dense annotation. While Unsupervised Domain Adaptation (UDA) from label-rich pinhole-camera datasets offers a viable alternative, many real-world tasks impose a stricter source-free (SFUDA) constraint where source data is inaccessible for privacy or proprietary reasons. This constraint significantly amplifies the core problems of domain shift, leading to unreliable pseudo-labels and dramatic performance degradation, particularly for minority classes. To overcome these limitations, we propose the DAPASS framework. DAPASS introduces two synergistic modules to robustly transfer knowledge without source data. First, our Panoramic Confidence-Guided Denoising (PCGD) module generates high-fidelity, class-balanced pseudo-labels by enforcing perturbation consistency and incorporating neighborhood-level confidence to filter noise. Second, a Contextual Resolution Adversarial Module (CRAM) explicitly addresses scale variance and distortion by adversarially aligning fine-grained details from high-resolution crops with global semantics from low-resolution contexts. DAPASS achieves state-of-the-art performances on outdoor (Cityscapes-to-DensePASS) and indoor (Stanford2D3D) benchmarks, yielding 55.04% (+2.05%) and 70.38% (+1.54%) mIoU, respectively.