MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the absence of a unified evaluation benchmark for Multilingual Entity Recognition and Linking (MERL) in multimodal settings. We introduce the first dedicated multilingual multimodal entity linking benchmark, comprising BBC news headlines and corresponding images across five languages, annotated with over 7,000 entity mentions linked to more than 2,500 Wikidata entities. To tackle MERL, we propose a joint modeling approach that integrates multilingual large language models (e.g., LLaMA-2, Aya-23) with multimodal encoders to enable text–image collaborative disambiguation. Experimental results demonstrate that visual cues substantially improve cross-lingual linking accuracy—particularly mitigating textual ambiguity in low-resource languages. Our benchmark is the first to systematically quantify the critical gain from multimodal signals in multilingual entity linking, establishing a reproducible and extensible evaluation standard for future research.

Technology Category

Application Category

📝 Abstract

This paper introduces MERLIN, a novel testbed system for the task of Multilingual Multimodal Entity Linking. The created dataset includes BBC news article titles, paired with corresponding images, in five languages: Hindi, Japanese, Indonesian, Vietnamese, and Tamil, featuring over 7,000 named entity mentions linked to 2,500 unique Wikidata entities. We also include several benchmarks using multilingual and multimodal entity linking methods exploring different language models like LLaMa-2 and Aya-23. Our findings indicate that incorporating visual data improves the accuracy of entity linking, especially for entities where the textual context is ambiguous or insufficient, and particularly for models that do not have strong multilingual abilities. For the work, the dataset, methods are available here at https://github.com/rsathya4802/merlin

Problem

Research questions and friction points this paper is trying to address.

Develops multilingual multimodal entity recognition testbed

Links news titles and images across five languages

Evaluates visual data impact on entity linking accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual multimodal testbed with five languages

Visual data integration improves ambiguous entity linking

Benchmarks using LLaMa-2 and Aya-23 language models

🔎 Similar Papers

No similar papers found.