DELICATE: Diachronic Entity LInking using Classes And Temporal Evidence

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This paper addresses three core challenges in entity linking (EL) for humanities texts—particularly historical Italian documents: complex document structure, scarcity of domain-specific annotated data, and inadequate coverage of long-tail entities in knowledge bases. To tackle these, we propose DELICATE, a neuro-symbolic EL framework that integrates a BERT-based encoder with structured Wikidata context, augmented by temporal plausibility modeling and fine-grained entity type constraints to improve long-tail entity recognition and disambiguation. Concurrently, we introduce and publicly release ENEIDE, the first large-scale, multi-domain, manually annotated corpus of historical Italian texts. Experiments demonstrate that DELICATE significantly outperforms state-of-the-art models—including larger-parameter baselines—on historical EL tasks, while offering enhanced interpretability and feature sensitivity. The framework provides a reproducible, extensible, and domain-adapted solution for humanities computing.

Technology Category

Application Category

📝 Abstract

In spite of the remarkable advancements in the field of Natural Language Processing, the task of Entity Linking (EL) remains challenging in the field of humanities due to complex document typologies, lack of domain-specific datasets and models, and long-tail entities, i.e., entities under-represented in Knowledge Bases (KBs). The goal of this paper is to address these issues with two main contributions. The first contribution is DELICATE, a novel neuro-symbolic method for EL on historical Italian which combines a BERT-based encoder with contextual information from Wikidata to select appropriate KB entities using temporal plausibility and entity type consistency. The second contribution is ENEIDE, a multi-domain EL corpus in historical Italian semi-automatically extracted from two annotated editions spanning from the 19th to the 20th century and including literary and political texts. Results show how DELICATE outperforms other EL models in historical Italian even if compared with larger architectures with billions of parameters. Moreover, further analyses reveal how DELICATE confidence scores and features sensitivity provide results which are more explainable and interpretable than purely neural methods.

Problem

Research questions and friction points this paper is trying to address.

Entity Linking challenges in humanities due to domain complexity

Lack of domain-specific datasets and models for historical Italian

Addressing long-tail entities underrepresented in Knowledge Bases

Innovation

Methods, ideas, or system contributions that make the work stand out.

BERT-based encoder with Wikidata contextual information

Temporal plausibility and entity type consistency

Neuro-symbolic method for historical Italian EL

🔎 Similar Papers

No similar papers found.