Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses semantic degradation in unsupervised cross-domain image retrieval, which arises from inaccurate pseudo-label semantics and entanglement between domain-specific and semantic information. To mitigate these issues, the authors propose a dual-prior collaboration mechanism that leverages CLIP-generated domain-specific textual prompts to provide precise semantic supervision, while simultaneously introducing domain-invariant phase features to disentangle domain and semantic representations. By synergistically integrating these two components, the method optimizes cross-domain feature representations without requiring labeled data, thereby preserving semantic integrity and aligning feature distributions across domains. Extensive experiments demonstrate that the proposed approach significantly outperforms state-of-the-art methods on multiple unsupervised cross-domain image retrieval benchmarks.

Technology Category

Application Category

📝 Abstract

This paper studies unsupervised cross-domain image retrieval (UCDIR), which aims to retrieve images of the same category across different domains without relying on labeled data. Existing methods typically utilize pseudo-labels, derived from clustering algorithms, as supervisory signals for intra-domain representation learning and cross-domain feature alignment. However, these discrete pseudo-labels often fail to provide accurate and comprehensive semantic guidance. Moreover, the alignment process frequently overlooks the entanglement between domain-specific and semantic information, leading to semantic degradation in the learned representations and ultimately impairing retrieval performance. This paper addresses the limitations by proposing a Text-Phase Synergy Network with Dual Priors(TPSNet). Specifically, we first employ CLIP to generate a set of class-specific prompts per domain, termed as domain prompt, serving as a text prior that offers more precise semantic supervision. In parallel, we further introduce a phase prior, represented by domain-invariant phase features, which is integrated into the original image representations to bridge the domain distribution gaps while preserving semantic integrity. Leveraging the synergy of these dual priors, TPSNet significantly outperforms state-of-the-art methods on UCDIR benchmarks.

Problem

Research questions and friction points this paper is trying to address.

unsupervised cross-domain image retrieval

pseudo-labels

semantic degradation

domain alignment

semantic guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-Phase Synergy

Dual Priors

Unsupervised Cross-Domain Image Retrieval