Clustering-based Image-Text Graph Matching for Domain Generalization

📅 2023-10-04

🏛️ International Conference on Pattern Recognition

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Existing image-text matching models exhibit poor generalization to unseen target domains in cross-domain settings. Method: This paper proposes an unsupervised domain-invariant representation learning framework. Its core innovation is the first coupling of semantic clustering with cross-modal graph-structure alignment: fine-grained semantic clusters—driven by K-means—are used to discover transferable correspondences between image local regions and text tokens; a heterogeneous graph neural network (HGNN) is then constructed to model the dual-modal topological structure; finally, a contrastive graph-matching loss enforces cross-domain alignment of these graph structures. Contribution/Results: The method requires no target-domain annotations and significantly improves out-of-domain generalization. On multi-domain transfer benchmarks—including Flickr30K and COCO—it achieves absolute R@1 gains of 6.2–9.8%, surpassing state-of-the-art domain-generalization methods for image-text matching.

Problem

Research questions and friction points this paper is trying to address.

Cross-domain Image Recognition

Text-Image Alignment

Visual Feature Extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained Alignment

Cross-modal Feature Mapping

Graph-based Representation

🔎 Similar Papers

Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap