🤖 AI Summary
To address the sparsity of non-English information, semantic gaps, and challenges in cross-lingual knowledge integration within multilingual knowledge graphs, this paper proposes, for the first time, a unified sequence-to-sequence framework jointly modeling knowledge graph completion (KGC) and cross-lingual knowledge graph entity text completion (KGE). Methodologically, we design an encoder-decoder-based multilingual generative architecture that jointly encodes triple structures and cross-lingual entity descriptions, enabling end-to-end joint training. Our key contributions are: (1) the construction of WikiKGE10++, the first human-annotated, 10-language benchmark with 25K entities; (2) substantial improvements over single-task baselines on both multilingual KGC and KGE; and (3) the establishment of the largest and most authoritative evaluation standard to date for multilingual knowledge graph text completion.
📝 Abstract
Multilingual knowledge graphs (KGs) provide high-quality relational and textual information for various NLP applications, but they are often incomplete, especially in non-English languages. Previous research has shown that combining information from KGs in different languages aids either Knowledge Graph Completion (KGC), the task of predicting missing relations between entities, or Knowledge Graph Enhancement (KGE), the task of predicting missing textual information for entities. Although previous efforts have considered KGC and KGE as independent tasks, we hypothesize that they are interdependent and mutually beneficial. To this end, we introduce KG-TRICK, a novel sequence-to-sequence framework that unifies the tasks of textual and relational information completion for multilingual KGs. KG-TRICK demonstrates that: i) it is possible to unify the tasks of KGC and KGE into a single framework, and ii) combining textual information from multiple languages is beneficial to improve the completeness of a KG. As part of our contributions, we also introduce WikiKGE10++, the largest manually-curated benchmark for textual information completion of KGs, which features over 25,000 entities across 10 diverse languages.