Robust Visual Representation Learning with Multi-modal Prior Knowledge for Image Classification Under Distribution Shift

📅 2024-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address degraded generalization in image classification under distribution shifts, this paper proposes Knowledge-Guided Visual Representation Learning (KGV). KGV innovatively models knowledge graph nodes as Gaussian distributions and relations as translation operations to construct structured semantic priors; it further incorporates visual priors from generative synthetic images, enabling multimodal embedding alignment and representation regularization within a unified latent space. By organically integrating knowledge graph embedding, Gaussian distribution alignment, and generative modeling, KGV significantly enhances model robustness across diverse distribution-shift benchmarks—including cross-country traffic sign recognition, mini-ImageNet variants, and DVM-CAR. Experimental results demonstrate an average 4.2% improvement in classification accuracy, a 37% gain in few-shot data efficiency, and consistently superior generalization performance over state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract
Despite the remarkable success of deep neural networks (DNNs) in computer vision, they fail to remain high-performing when facing distribution shifts between training and testing data. In this paper, we propose Knowledge-Guided Visual representation learning (KGV) - a distribution-based learning approach leveraging multi-modal prior knowledge - to improve generalization under distribution shift. It integrates knowledge from two distinct modalities: 1) a knowledge graph (KG) with hierarchical and association relationships; and 2) generated synthetic images of visual elements semantically represented in the KG. The respective embeddings are generated from the given modalities in a common latent space, i.e., visual embeddings from original and synthetic images as well as knowledge graph embeddings (KGEs). These embeddings are aligned via a novel variant of translation-based KGE methods, where the node and relation embeddings of the KG are modeled as Gaussian distributions and translations, respectively. We claim that incorporating multi-model prior knowledge enables more regularized learning of image representations. Thus, the models are able to better generalize across different data distributions. We evaluate KGV on different image classification tasks with major or minor distribution shifts, namely road sign classification across datasets from Germany, China, and Russia, image classification with the mini-ImageNet dataset and its variants, as well as the DVM-CAR dataset. The results demonstrate that KGV consistently exhibits higher accuracy and data efficiency across all experiments.
Problem

Research questions and friction points this paper is trying to address.

Improving generalization under distribution shift
Leveraging multi-modal prior knowledge
Enhancing image classification accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal prior knowledge integration
Translation-based knowledge graph embeddings
Synthetic and original image embeddings alignment
🔎 Similar Papers
No similar papers found.