CADEL: A Corpus of Administrative Web Documents for Japanese Entity Linking

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This study addresses the scarcity of high-quality annotated corpora covering Japan-specific entities, which has hindered research on Japanese entity linking. To bridge this gap, the authors present the first systematically constructed Japanese entity linking corpus focused on the Japanese administrative domain. They develop a dedicated annotation guideline, employ manual annotation with rigorous inter-annotator agreement evaluation to ensure data quality, and incorporate a substantial number of challenging entity mentions. Experimental results demonstrate that the corpus exhibits significant evaluative value in non-trivial disambiguation scenarios, highlighting its potential to serve as a standard benchmark for future research in Japanese entity linking.

Technology Category

Application Category

📝 Abstract

Entity linking is the task of associating linguistic expressions with entries in a knowledge base that represent real-world entities and concepts. Language resources for this task have primarily been developed for English, and the resources available for evaluating Japanese systems remain limited. In this study, we develop a corpus design policy for the entity linking task and construct an annotated corpus for training and evaluating Japanese entity linking systems, with rich coverage of linguistic expressions referring to entities that are specific to Japan. Evaluation of inter-annotator agreement confirms the high consistency of the annotations in the corpus, and a preliminary experiment on entity disambiguation based on string matching suggests that the corpus contains a substantial number of non-trivial cases, supporting its potential usefulness as an evaluation benchmark.

Problem

Research questions and friction points this paper is trying to address.

entity linking

Japanese

corpus

annotation

knowledge base

Innovation

Methods, ideas, or system contributions that make the work stand out.

entity linking

Japanese NLP

annotated corpus