Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses semantic degradation in unsupervised cross-domain image retrieval, which arises from inaccurate pseudo-label semantics and entanglement between domain-specific and semantic information. To mitigate these issues, the authors propose a dual-prior collaboration mechanism that leverages CLIP-generated domain-specific textual prompts to provide precise semantic supervision, while simultaneously introducing domain-invariant phase features to disentangle domain and semantic representations. By synergistically integrating these two components, the method optimizes cross-domain feature representations without requiring labeled data, thereby preserving semantic integrity and aligning feature distributions across domains. Extensive experiments demonstrate that the proposed approach significantly outperforms state-of-the-art methods on multiple unsupervised cross-domain image retrieval benchmarks.

Technology Category

Application Category

📝 Abstract
This paper studies unsupervised cross-domain image retrieval (UCDIR), which aims to retrieve images of the same category across different domains without relying on labeled data. Existing methods typically utilize pseudo-labels, derived from clustering algorithms, as supervisory signals for intra-domain representation learning and cross-domain feature alignment. However, these discrete pseudo-labels often fail to provide accurate and comprehensive semantic guidance. Moreover, the alignment process frequently overlooks the entanglement between domain-specific and semantic information, leading to semantic degradation in the learned representations and ultimately impairing retrieval performance. This paper addresses the limitations by proposing a Text-Phase Synergy Network with Dual Priors(TPSNet). Specifically, we first employ CLIP to generate a set of class-specific prompts per domain, termed as domain prompt, serving as a text prior that offers more precise semantic supervision. In parallel, we further introduce a phase prior, represented by domain-invariant phase features, which is integrated into the original image representations to bridge the domain distribution gaps while preserving semantic integrity. Leveraging the synergy of these dual priors, TPSNet significantly outperforms state-of-the-art methods on UCDIR benchmarks.
Problem

Research questions and friction points this paper is trying to address.

unsupervised cross-domain image retrieval
pseudo-labels
semantic degradation
domain alignment
semantic guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-Phase Synergy
Dual Priors
Unsupervised Cross-Domain Image Retrieval
Domain-Invariant Phase Features
CLIP-based Semantic Prompting
🔎 Similar Papers
No similar papers found.
J
Jing Yang
School of Computer Science and Engineering, Southeast University, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
Hui Xue
Hui Xue
Southeast University
machine learning
S
Shipeng Zhu
School of Computer Science and Engineering, Southeast University, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
Pengfei Fang
Pengfei Fang
Southeast University | Monash University | Australian National University | Data61/CSIRO
Machine LearningDeep LearningComputer Vision