🤖 AI Summary
Unsupervised domain adaptation (UDA) struggles to simultaneously achieve domain-invariant representations and preserve domain-specific features, primarily because conventional alignment methods enforce proximity of semantically similar samples in the latent space while ignoring their intrinsic domain disparities. To address this, we propose a language-guided structured alignment paradigm: leveraging class-level semantic structures defined by pre-trained vision-language models as priors, we constrain the relative positional relationships—not absolute coordinates—of equivalent concepts in the latent space to remain consistent across domains. Our approach integrates multimodal embeddings, structured contrastive learning, semantic relation distillation, and unsupervised feature disentanglement. Evaluated across 18 cross-domain scenarios on DomainNet, GeoPlaces, GeoImnet, and EgoExo4D, our method achieves state-of-the-art performance, improving average accuracy by 1.94%–5.75%. It is the first to jointly optimize cross-domain generalization and intra-domain discriminability.
📝 Abstract
Unsupervised domain adaptation remains a critical challenge in enabling the knowledge transfer of models across unseen domains. Existing methods struggle to balance the need for domain-invariant representations with preserving domain-specific features, which is often due to alignment approaches that impose the projection of samples with similar semantics close in the latent space despite their drastic domain differences. We introduce LAGUNA - LAnguage Guided UNsupervised Adaptation with structured spaces, a novel approach that shifts the focus from aligning representations in absolute coordinates to aligning the relative positioning of equivalent concepts in latent spaces. LAGUNA defines a domain-agnostic structure upon the semantic/geometric relationships between class labels in language space and guides adaptation, ensuring that the organization of samples in visual space reflects reference inter-class relationships while preserving domain-specific characteristics. We empirically demonstrate LAGUNA's superiority in domain adaptation tasks across four diverse images and video datasets. Remarkably, LAGUNA surpasses previous works in 18 different adaptation scenarios across four diverse image and video datasets with average accuracy improvements of +3.32% on DomainNet, +5.75% in GeoPlaces, +4.77% on GeoImnet, and +1.94% mean class accuracy improvement on EgoExo4D.