DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

In unsupervised cross-domain image retrieval (UCIR), performance is hindered by entanglement between semantic object features and domain-specific style features, impeding effective cross-domain matching. To address this, we propose the first diffusion model-based feature disentanglement framework for UCIR. Our method explicitly decouples semantic object representations from domain style using text-to-image generation priors, incorporates a cross-domain mutual nearest neighbor mechanism for progressive feature alignment, and augments discriminability via unsupervised contrastive learning. The core contribution lies in pioneering the integration of generative modeling—specifically diffusion-based priors—into UCIR to enable disentanglement-driven cross-domain semantic alignment. Extensive experiments across three standard benchmarks comprising 13 diverse domains demonstrate substantial improvements over state-of-the-art methods, validating both the efficacy and generalizability of generative priors in unsupervised cross-domain retrieval.

Technology Category

Application Category

📝 Abstract

Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images of the same category across diverse domains without relying on annotations. Existing UCIR methods, which align cross-domain features for the entire image, often struggle with the domain gap, as the object features critical for retrieval are frequently entangled with domain-specific styles. To address this challenge, we propose DUDE, a novel UCIR method building upon feature disentanglement. In brief, DUDE leverages a text-to-image generative model to disentangle object features from domain-specific styles, thus facilitating semantical image retrieval. To further achieve reliable alignment of the disentangled object features, DUDE aligns mutual neighbors from within domains to across domains in a progressive manner. Extensive experiments demonstrate that DUDE achieves state-of-the-art performance across three benchmark datasets over 13 domains. The code will be released.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised cross-domain image retrieval without annotations

Disentangling object features from domain-specific styles

Aligning mutual neighbors across domains progressively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages text-to-image generative model for disentanglement

Aligns mutual neighbors across domains progressively

Disentangles object features from domain-specific styles

🔎 Similar Papers

Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport