DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In unsupervised cross-domain image retrieval (UCIR), performance is hindered by entanglement between semantic object features and domain-specific style features, impeding effective cross-domain matching. To address this, we propose the first diffusion model-based feature disentanglement framework for UCIR. Our method explicitly decouples semantic object representations from domain style using text-to-image generation priors, incorporates a cross-domain mutual nearest neighbor mechanism for progressive feature alignment, and augments discriminability via unsupervised contrastive learning. The core contribution lies in pioneering the integration of generative modeling—specifically diffusion-based priors—into UCIR to enable disentanglement-driven cross-domain semantic alignment. Extensive experiments across three standard benchmarks comprising 13 diverse domains demonstrate substantial improvements over state-of-the-art methods, validating both the efficacy and generalizability of generative priors in unsupervised cross-domain retrieval.

Technology Category

Application Category

📝 Abstract
Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images of the same category across diverse domains without relying on annotations. Existing UCIR methods, which align cross-domain features for the entire image, often struggle with the domain gap, as the object features critical for retrieval are frequently entangled with domain-specific styles. To address this challenge, we propose DUDE, a novel UCIR method building upon feature disentanglement. In brief, DUDE leverages a text-to-image generative model to disentangle object features from domain-specific styles, thus facilitating semantical image retrieval. To further achieve reliable alignment of the disentangled object features, DUDE aligns mutual neighbors from within domains to across domains in a progressive manner. Extensive experiments demonstrate that DUDE achieves state-of-the-art performance across three benchmark datasets over 13 domains. The code will be released.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised cross-domain image retrieval without annotations
Disentangling object features from domain-specific styles
Aligning mutual neighbors across domains progressively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages text-to-image generative model for disentanglement
Aligns mutual neighbors across domains progressively
Disentangles object features from domain-specific styles
🔎 Similar Papers
No similar papers found.
R
Ruohong Yang
College of Computer Science, Sichuan University, Chengdu, 610065, China.
P
Peng Hu
College of Computer Science, Sichuan University, Chengdu, 610065, China.
Yunfan Li
Yunfan Li
Sichuan University, College of Computer Science, Chengdu, China
Clustering
X
Xi Peng
College of Computer Science, Sichuan University, Chengdu, 610065, China.