Adversarially Domain-adaptive Latent Diffusion for Unsupervised Semantic Segmentation

📅 2024-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address geometric misalignment and distribution shift in unsupervised domain adaptation (UDA) for semantic segmentation—specifically from synthetic domains (GTA5/Synthia) to the real-world Cityscapes domain—this paper proposes the Inter-Coder Connected Latent Diffusion (ICCLD) framework. Methodologically, ICCLD introduces, for the first time, cross-encoder connectivity within the latent diffusion process, jointly optimized with adversarial domain alignment to synergistically enhance both contextual modeling capability and detail fidelity in the latent space. The framework integrates dual-encoder interaction, latent-space adversarial alignment loss, and a U-Net-based decoder. Evaluated on GTA5→Cityscapes and Synthia→Cityscapes benchmarks, ICCLD achieves 74.4% and 67.2% mean Intersection-over-Union (mIoU), respectively—outperforming state-of-the-art UDA approaches. Its key contribution lies in bridging latent diffusion with structured encoder collaboration and adversarial alignment, enabling more robust and semantically consistent cross-domain transfer.

Technology Category

Application Category

📝 Abstract
Semantic segmentation requires extensive pixel-level annotation, motivating unsupervised domain adaptation (UDA) to transfer knowledge from labelled source domains to unlabelled or weakly labelled target domains. One of the most efficient strategies involves using synthetic datasets generated within controlled virtual environments, such as video games or traffic simulators, which can automatically generate pixel-level annotations. However, even when such datasets are available, learning a well-generalised representation that captures both domains remains challenging, owing to probabilistic and geometric discrepancies between the virtual world and real-world imagery. This work introduces a semantic segmentation method based on latent diffusion models, termed Inter-Coder Connected Latent Diffusion (ICCLD), alongside an unsupervised domain adaptation approach. The model employs an inter-coder connection to enhance contextual understanding and preserve fine details, while adversarial learning aligns latent feature distributions across domains during the latent diffusion process. Experiments on GTA5, Synthia, and Cityscapes demonstrate that ICCLD outperforms state-of-the-art UDA methods, achieving mIoU scores of 74.4 (GTA5$ ightarrow$Cityscapes) and 67.2 (Synthia$ ightarrow$Cityscapes).
Problem

Research questions and friction points this paper is trying to address.

Unsupervised domain adaptation for semantic segmentation
Reducing virtual-real domain discrepancies in segmentation
Enhancing cross-domain feature alignment via latent diffusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses latent diffusion models for segmentation
Inter-coder connection enhances contextual details
Adversarial learning aligns domain feature distributions
🔎 Similar Papers
No similar papers found.