Evaluation of LLMs on Long-tail Entity Linking in Historical Documents

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study presents the first systematic evaluation of large language models (LLMs) on long-tail entity linking (EL) in historical documents—targeting low-frequency, domain-specific, and sparsely annotated historical entities. Using the manually curated MHERCL v0.1 benchmark, we employ zero-shot prompting with Wikidata as the external knowledge source and compare LLMs (GPT and Llama3) against traditional EL systems (e.g., ReLiK). Results demonstrate that LLMs significantly outperform baselines, achieving a 23.6% absolute accuracy gain on long-tail entities and substantially narrowing the performance gap between head and tail entities. Crucially, this improvement is attained without fine-tuning and with minimal reliance on external resources, underscoring LLMs’ practical utility and strong generalization capacity for low-resource historical natural language processing tasks. The findings highlight LLMs’ promise in addressing data scarcity and domain specificity challenges endemic to historical text analysis.

Technology Category

Application Category

📝 Abstract

Entity Linking (EL) plays a crucial role in Natural Language Processing (NLP) applications, enabling the disambiguation of entity mentions by linking them to their corresponding entries in a reference knowledge base (KB). Thanks to their deep contextual understanding capabilities, LLMs offer a new perspective to tackle EL, promising better results than traditional methods. Despite the impressive generalization capabilities of LLMs, linking less popular, long-tail entities remains challenging as these entities are often underrepresented in training data and knowledge bases. Furthermore, the long-tail EL task is an understudied problem, and limited studies address it with LLMs. In the present work, we assess the performance of two popular LLMs, GPT and LLama3, in a long-tail entity linking scenario. Using MHERCL v0.1, a manually annotated benchmark of sentences from domain-specific historical texts, we quantitatively compare the performance of LLMs in identifying and linking entities to their corresponding Wikidata entries against that of ReLiK, a state-of-the-art Entity Linking and Relation Extraction framework. Our preliminary experiments reveal that LLMs perform encouragingly well in long-tail EL, indicating that this technology can be a valuable adjunct in filling the gap between head and long-tail EL.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs on long-tail entity linking in historical documents

Addressing underrepresentation of long-tail entities in training data

Comparing LLMs with traditional methods in entity linking performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for long-tail entity linking

Comparison with ReLiK framework

Using MHERCL v0.1 benchmark

🔎 Similar Papers

No similar papers found.

Authors to Follow