PGMEL: Policy Gradient-based Generative Adversarial Network for Multimodal Entity Linking

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal entity linking (MEL) approaches overlook the critical role of high-quality hard negative samples in cross-modal representation learning. To address this, we propose the first adversarial framework for MEL, grounded in policy gradient-based reinforcement learning: a generator dynamically produces semantically similar hard negatives, while a discriminator jointly optimizes metric learning across textual and visual modalities. Our method overcomes the non-differentiability challenge inherent in discrete sample generation, enabling end-to-end adaptive hard negative mining and cross-modal semantic alignment. Extensive experiments on three benchmarks—Wiki-MEL, Richpedia-MEL, and WikiDiverse—demonstrate substantial improvements over state-of-the-art methods, validating the effectiveness of our framework in enhancing both cross-modal semantic matching accuracy and representation discriminability.

Technology Category

Application Category

📝 Abstract
The task of entity linking, which involves associating mentions with their respective entities in a knowledge graph, has received significant attention due to its numerous potential applications. Recently, various multimodal entity linking (MEL) techniques have been proposed, targeted to learn comprehensive embeddings by leveraging both text and vision modalities. The selection of high-quality negative samples can potentially play a crucial role in metric/representation learning. However, to the best of our knowledge, this possibility remains unexplored in existing literature within the framework of MEL. To fill this gap, we address the multimodal entity linking problem in a generative adversarial setting where the generator is responsible for generating high-quality negative samples, and the discriminator is assigned the responsibility for the metric learning tasks. Since the generator is involved in generating samples, which is a discrete process, we optimize it using policy gradient techniques and propose a policy gradient-based generative adversarial network for multimodal entity linking (PGMEL). Experimental results based on Wiki-MEL, Richpedia-MEL and WikiDiverse datasets demonstrate that PGMEL learns meaningful representation by selecting challenging negative samples and outperforms state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Improving multimodal entity linking via adversarial learning
Generating high-quality negative samples using policy gradients
Enhancing representation learning through challenging sample selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative adversarial network for multimodal entity linking
Policy gradient optimizes discrete negative sample generation
Generator creates challenging negatives to improve representation learning
🔎 Similar Papers
No similar papers found.
K
KM Pooja
Department of Information Technology, Indian Institute of Information Technology, Allahabad India 211012
Cheng Long
Cheng Long
Nanyang Technological University
databasesmachine learningdata mining
A
Aixin Sun
School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore 639798