LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces

📅 2024-11-23

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Unsupervised domain adaptation (UDA) struggles to simultaneously achieve domain-invariant representations and preserve domain-specific features, primarily because conventional alignment methods enforce proximity of semantically similar samples in the latent space while ignoring their intrinsic domain disparities. To address this, we propose a language-guided structured alignment paradigm: leveraging class-level semantic structures defined by pre-trained vision-language models as priors, we constrain the relative positional relationships—not absolute coordinates—of equivalent concepts in the latent space to remain consistent across domains. Our approach integrates multimodal embeddings, structured contrastive learning, semantic relation distillation, and unsupervised feature disentanglement. Evaluated across 18 cross-domain scenarios on DomainNet, GeoPlaces, GeoImnet, and EgoExo4D, our method achieves state-of-the-art performance, improving average accuracy by 1.94%–5.75%. It is the first to jointly optimize cross-domain generalization and intra-domain discriminability.

Technology Category

Application Category

📝 Abstract

Unsupervised domain adaptation remains a critical challenge in enabling the knowledge transfer of models across unseen domains. Existing methods struggle to balance the need for domain-invariant representations with preserving domain-specific features, which is often due to alignment approaches that impose the projection of samples with similar semantics close in the latent space despite their drastic domain differences. We introduce LAGUNA - LAnguage Guided UNsupervised Adaptation with structured spaces, a novel approach that shifts the focus from aligning representations in absolute coordinates to aligning the relative positioning of equivalent concepts in latent spaces. LAGUNA defines a domain-agnostic structure upon the semantic/geometric relationships between class labels in language space and guides adaptation, ensuring that the organization of samples in visual space reflects reference inter-class relationships while preserving domain-specific characteristics. We empirically demonstrate LAGUNA's superiority in domain adaptation tasks across four diverse images and video datasets. Remarkably, LAGUNA surpasses previous works in 18 different adaptation scenarios across four diverse image and video datasets with average accuracy improvements of +3.32% on DomainNet, +5.75% in GeoPlaces, +4.77% on GeoImnet, and +1.94% mean class accuracy improvement on EgoExo4D.

Problem

Research questions and friction points this paper is trying to address.

Balancing domain-invariant and domain-specific features in adaptation

Aligning relative concept positioning in latent spaces

Guiding adaptation with language-space semantic relationships

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns relative positioning in latent spaces

Uses language-guided domain-agnostic structure

Preserves domain-specific characteristics adaptively

🔎 Similar Papers

No similar papers found.