🤖 AI Summary
Unsupervised domain adaptation (UDA) for semantic segmentation suffers from weak cross-domain generalization due to domain-specific feature bias.
Method: This paper proposes leveraging the domain-invariance of text embeddings to enhance feature invariance learning. Its core innovation is a covariance-driven pixel-text alignment loss (CoPT): (i) domain-agnostic text descriptions for source and target images are generated via a large language model and encoded into fixed, domain-invariant text embeddings using a frozen CLIP model; (ii) within the image encoder’s feature space, CoPT enforces statistical alignment—via covariance constraints—between pixel-level visual features and their corresponding text embeddings.
Contribution/Results: The method operates fully unsupervised (no target labels required) and significantly improves feature discriminability and domain robustness. It achieves new state-of-the-art performance on four standard UDA segmentation benchmarks (e.g., GTA→Cityscapes), boosting mean IoU by up to 3.2%, thereby validating the effectiveness of textual priors in guiding visual domain adaptation.
📝 Abstract
Unsupervised domain adaptation (UDA) involves learning class semantics from labeled data within a source domain that generalize to an unseen target domain. UDA methods are particularly impactful for semantic segmentation, where annotations are more difficult to collect than in image classification. Despite recent advances in large-scale vision-language representation learning, UDA methods for segmentation have not taken advantage of the domain-agnostic properties of text. To address this, we present a novel Covariance-based Pixel-Text loss, CoPT, that uses domain-agnostic text embeddings to learn domain-invariant features in an image segmentation encoder. The text embeddings are generated through our LLM Domain Template process, where an LLM is used to generate source and target domain descriptions that are fed to a frozen CLIP model and combined. In experiments on four benchmarks we show that a model trained using CoPT achieves the new state of the art performance on UDA for segmentation. The code can be found at https://github.com/cfmata/CoPT.