MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the absence of a unified evaluation benchmark for Multilingual Entity Recognition and Linking (MERL) in multimodal settings. We introduce the first dedicated multilingual multimodal entity linking benchmark, comprising BBC news headlines and corresponding images across five languages, annotated with over 7,000 entity mentions linked to more than 2,500 Wikidata entities. To tackle MERL, we propose a joint modeling approach that integrates multilingual large language models (e.g., LLaMA-2, Aya-23) with multimodal encoders to enable text–image collaborative disambiguation. Experimental results demonstrate that visual cues substantially improve cross-lingual linking accuracy—particularly mitigating textual ambiguity in low-resource languages. Our benchmark is the first to systematically quantify the critical gain from multimodal signals in multilingual entity linking, establishing a reproducible and extensible evaluation standard for future research.

Technology Category

Application Category

📝 Abstract
This paper introduces MERLIN, a novel testbed system for the task of Multilingual Multimodal Entity Linking. The created dataset includes BBC news article titles, paired with corresponding images, in five languages: Hindi, Japanese, Indonesian, Vietnamese, and Tamil, featuring over 7,000 named entity mentions linked to 2,500 unique Wikidata entities. We also include several benchmarks using multilingual and multimodal entity linking methods exploring different language models like LLaMa-2 and Aya-23. Our findings indicate that incorporating visual data improves the accuracy of entity linking, especially for entities where the textual context is ambiguous or insufficient, and particularly for models that do not have strong multilingual abilities. For the work, the dataset, methods are available here at https://github.com/rsathya4802/merlin
Problem

Research questions and friction points this paper is trying to address.

Develops multilingual multimodal entity recognition testbed
Links news titles and images across five languages
Evaluates visual data impact on entity linking accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual multimodal testbed with five languages
Visual data integration improves ambiguous entity linking
Benchmarks using LLaMa-2 and Aya-23 language models
🔎 Similar Papers
No similar papers found.
Sathyanarayanan Ramamoorthy
Sathyanarayanan Ramamoorthy
Carnegie Mellon University
V
Vishwa Shah
Carnegie Mellon University
Simran Khanuja
Simran Khanuja
Carnegie Mellon University
Multilingual NLPMultimodal NLP
Z
Zaid Sheikh
Carnegie Mellon University
S
Shan Jie
Defence Science and Technology Agency, Singapore
A
Ann Chia
Defence Science and Technology Agency, Singapore
S
Shearman Chua
Defence Science and Technology Agency, Singapore
Graham Neubig
Graham Neubig
Carnegie Mellon University, All Hands AI
Natural Language ProcessingMachine LearningArtificial Intelligence