🤖 AI Summary
Entity linking in music historical texts is severely hindered by the widespread absence or sparsity of named entities—especially historical figures, works, and institutions—in mainstream knowledge bases (KBs). Method: We propose an unsupervised, knowledge graph (KG)-enhanced framework that uniquely integrates KG logical constraints with heuristic NIL (Not-In-KB) entity identification, leveraging KG embedding, unsupervised semantic matching, and historical context modeling. Contribution/Results: To support evaluation, we introduce MHERCL—the first fine-grained, human-annotated benchmark for music cultural heritage, covering three underrepresented entity types. Experiments demonstrate that our model consistently outperforms state-of-the-art methods on MHERCL, achieving substantial gains in linking robustness and cross-domain generalization. This validates the efficacy of KG-guided unsupervised learning for historical text entity linking.
📝 Abstract
Linking named entities occurring in text to their corresponding entity in a Knowledge Base (KB) is challenging, especially when dealing with historical texts. In this work, we introduce Musical Heritage named Entities Recognition, Classification and Linking (MHERCL), a novel benchmark consisting of manually annotated sentences extrapolated from historical periodicals of the music domain. MHERCL contains named entities under-represented or absent in the most famous KBs. We experiment with several State-of-the-Art models on the Entity Linking (EL) task and show that MHERCL is a challenging dataset for all of them. We propose a novel unsupervised EL model and a method to extend supervised entity linkers by using Knowledge Graphs (KGs) to tackle the main difficulties posed by historical documents. Our experiments reveal that relying on unsupervised techniques and improving models with logical constraints based on KGs and heuristics to predict NIL entities (entities not represented in the KB of reference) results in better EL performance on historical documents.