DiffusionCom: Structure-Aware Multimodal Diffusion Model for Multimodal Knowledge Graph Completion

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing discriminative multimodal knowledge graph completion (MKGC) methods struggle to capture complex structural dependencies and fine-grained multimodal correlations. To address this, this paper pioneers the integration of generative diffusion models into MKGC, directly modeling the joint probability distribution over (head, relation) and tail entities, and generating candidate tails via iterative denoising. We propose a structure-aware multimodal diffusion framework featuring: (i) Structure-MKGformer, an encoder that jointly leverages graph attention and adaptive multimodal feature fusion; (ii) a Multimodal Graph Attention Network (MGAT) for cross-modal structural reasoning; and (iii) a unified generative-discriminative training paradigm. Extensive experiments on FB15k-237-IMG and WN18-IMG demonstrate substantial improvements over state-of-the-art methods, validating the effectiveness of synergistically combining generative modeling with structural awareness for MKGC.

Technology Category

Application Category

📝 Abstract
Most current MKGC approaches are predominantly based on discriminative models that maximize conditional likelihood. These approaches struggle to efficiently capture the complex connections in real-world knowledge graphs, thereby limiting their overall performance. To address this issue, we propose a structure-aware multimodal Diffusion model for multimodal knowledge graph Completion (DiffusionCom). DiffusionCom innovatively approaches the problem from the perspective of generative models, modeling the association between the $(head, relation)$ pair and candidate tail entities as their joint probability distribution $p((head, relation), (tail))$, and framing the MKGC task as a process of gradually generating the joint probability distribution from noise. Furthermore, to fully leverage the structural information in MKGs, we propose Structure-MKGformer, an adaptive and structure-aware multimodal knowledge representation learning method, as the encoder for DiffusionCom. Structure-MKGformer captures rich structural information through a multimodal graph attention network (MGAT) and adaptively fuses it with entity representations, thereby enhancing the structural awareness of these representations. This design effectively addresses the limitations of existing MKGC methods, particularly those based on multimodal pre-trained models, in utilizing structural information. DiffusionCom is trained using both generative and discriminative losses for the generator, while the feature extractor is optimized exclusively with discriminative loss. This dual approach allows DiffusionCom to harness the strengths of both generative and discriminative models. Extensive experiments on the FB15k-237-IMG and WN18-IMG datasets demonstrate that DiffusionCom outperforms state-of-the-art models.
Problem

Research questions and friction points this paper is trying to address.

Captures complex connections in real-world knowledge graphs
Generates joint probability distribution for MKGC task
Enhances structural awareness in multimodal knowledge representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative diffusion model for MKGC
Structure-MKGformer with multimodal attention
Dual generative-discriminative loss training
🔎 Similar Papers
No similar papers found.
W
Wei Huang
Beijing University of Posts and Telecommunications, Beijing, China
M
Meiyu Liang
Beijing University of Posts and Telecommunications, Beijing, China
P
Peining Li
Beijing University of Posts and Telecommunications, Beijing, China
Xu Hou
Xu Hou
Professor of Xiamen University
bio-inspired design of advanced materialschemical modificationbiomedical engineeringmicrofluidicsmembrane science
Yawen Li
Yawen Li
Lawrence Technological University
Biomaterialstissue engineeringBioMEMS
Junping Du
Junping Du
Beijing University of Posts and Telecommunications
Z
Zhe Xue
Beijing University of Posts and Telecommunications, Beijing, China
Z
Zeli Guan
Beijing University of Posts and Telecommunications, Beijing, China