Clustering-based Image-Text Graph Matching for Domain Generalization

📅 2023-10-04
🏛️ International Conference on Pattern Recognition
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing image-text matching models exhibit poor generalization to unseen target domains in cross-domain settings. Method: This paper proposes an unsupervised domain-invariant representation learning framework. Its core innovation is the first coupling of semantic clustering with cross-modal graph-structure alignment: fine-grained semantic clusters—driven by K-means—are used to discover transferable correspondences between image local regions and text tokens; a heterogeneous graph neural network (HGNN) is then constructed to model the dual-modal topological structure; finally, a contrastive graph-matching loss enforces cross-domain alignment of these graph structures. Contribution/Results: The method requires no target-domain annotations and significantly improves out-of-domain generalization. On multi-domain transfer benchmarks—including Flickr30K and COCO—it achieves absolute R@1 gains of 6.2–9.8%, surpassing state-of-the-art domain-generalization methods for image-text matching.
Problem

Research questions and friction points this paper is trying to address.

Cross-domain Image Recognition
Text-Image Alignment
Visual Feature Extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained Alignment
Cross-modal Feature Mapping
Graph-based Representation
🔎 Similar Papers
2024-02-06International Conference on Learning RepresentationsCitations: 0