LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces

📅 2024-11-23
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Unsupervised domain adaptation (UDA) struggles to simultaneously achieve domain-invariant representations and preserve domain-specific features, primarily because conventional alignment methods enforce proximity of semantically similar samples in the latent space while ignoring their intrinsic domain disparities. To address this, we propose a language-guided structured alignment paradigm: leveraging class-level semantic structures defined by pre-trained vision-language models as priors, we constrain the relative positional relationships—not absolute coordinates—of equivalent concepts in the latent space to remain consistent across domains. Our approach integrates multimodal embeddings, structured contrastive learning, semantic relation distillation, and unsupervised feature disentanglement. Evaluated across 18 cross-domain scenarios on DomainNet, GeoPlaces, GeoImnet, and EgoExo4D, our method achieves state-of-the-art performance, improving average accuracy by 1.94%–5.75%. It is the first to jointly optimize cross-domain generalization and intra-domain discriminability.

Technology Category

Application Category

📝 Abstract
Unsupervised domain adaptation remains a critical challenge in enabling the knowledge transfer of models across unseen domains. Existing methods struggle to balance the need for domain-invariant representations with preserving domain-specific features, which is often due to alignment approaches that impose the projection of samples with similar semantics close in the latent space despite their drastic domain differences. We introduce LAGUNA - LAnguage Guided UNsupervised Adaptation with structured spaces, a novel approach that shifts the focus from aligning representations in absolute coordinates to aligning the relative positioning of equivalent concepts in latent spaces. LAGUNA defines a domain-agnostic structure upon the semantic/geometric relationships between class labels in language space and guides adaptation, ensuring that the organization of samples in visual space reflects reference inter-class relationships while preserving domain-specific characteristics. We empirically demonstrate LAGUNA's superiority in domain adaptation tasks across four diverse images and video datasets. Remarkably, LAGUNA surpasses previous works in 18 different adaptation scenarios across four diverse image and video datasets with average accuracy improvements of +3.32% on DomainNet, +5.75% in GeoPlaces, +4.77% on GeoImnet, and +1.94% mean class accuracy improvement on EgoExo4D.
Problem

Research questions and friction points this paper is trying to address.

Balancing domain-invariant and domain-specific features in adaptation
Aligning relative concept positioning in latent spaces
Guiding adaptation with language-space semantic relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns relative positioning in latent spaces
Uses language-guided domain-agnostic structure
Preserves domain-specific characteristics adaptively
🔎 Similar Papers
No similar papers found.
A
Anxhelo Diko
La Sapienza University of Roma
Antonino Furnari
Antonino Furnari
Assistant Professor at the University of Catania
Computer Vision
Luigi Cinque
Luigi Cinque
Sapienza
Computer Vision
G
G. Farinella
University of Catania