RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses the performance bottleneck in multimodal knowledge graph completion caused by the tight coupling of retrieval and reranking. To this end, we propose RADD, a novel framework that uniquely integrates retrieval augmentation with discrete diffusion mechanisms. By decoupling global retrieval from local reranking, RADD introduces a relation-aware multimodal knowledge graph embedding retriever and a conditional discrete denoising model. The framework further incorporates denoising cross-entropy training, temperature-scaled knowledge distillation, and a Diff-Rerank inference strategy to effectively separate and jointly optimize the inductive biases of the two stages. Extensive experiments demonstrate that RADD significantly outperforms state-of-the-art baselines—including unimodal, multimodal, and large language model approaches—across three multimodal benchmark datasets, while ablation studies confirm the contribution of each component.

📝 Abstract

Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases. Therefore, we propose a Retrieval-Augmented Discrete Diffusion (RADD) framework to decouple retrieve and reranking for MMKGC. A relation-aware multimodal KGE retriever serves as both global retriever and distillation teacher, while a conditional discrete denoiser performs shortlist-level entity-identity generation for reranking. Training combines KGE supervision, denoising cross-entropy, and temperature-scaled distillation from the retriever to the denoiser. At inference, the designed Diff-Rerank first forms a top-$K$ shortlist with the retriever and then reranks it with the denoiser, ensuring that recall is a strict prerequisite for precision. Experiments on three MMKGC benchmarks show that RADD achieves the best performance and consistent gains over strong unimodal, multimodal, and LLM-based baselines, while ablations further verify the contribution of each component.

Problem

Research questions and friction points this paper is trying to address.

multi-modal knowledge graph completion

retrieval

reranking

entity disambiguation

knowledge graph embedding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented

Discrete Diffusion

Knowledge Graph Completion