Unsupervised Robust Cross-Lingual Entity Alignment via Neighbor Triple Matching with Entity and Relation Texts

📅 2024-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing unsupervised cross-lingual entity alignment methods suffer from three key limitations: (1) neglect of relational semantics, (2) over-reliance on the strong isomorphism assumption between source and target knowledge graphs (KGs), and (3) vulnerability to textual noise such as translation inconsistencies and out-of-vocabulary (OOV) terms. To address these issues, we propose ERAlign, an unsupervised and robust framework introducing a novel “alignment–verification” two-stage paradigm. ERAlign jointly models entity and relational semantics via linearized neighborhood triplets and cross-lingual encoders (e.g., mBERT), enabling fine-grained entity- and relation-level alignment. It explicitly relaxes the isomorphism assumption by incorporating iterative alignment fusion and a text-based verification mechanism to enhance robustness. Extensive experiments on multiple cross-lingual KG benchmarks demonstrate that ERAlign significantly outperforms state-of-the-art unsupervised methods, achieving an average accuracy gain of 8.2%. Notably, it maintains over 95% accuracy under noisy conditions.

Technology Category

Application Category

📝 Abstract
Cross-lingual entity alignment (EA) enables the integration of multiple knowledge graphs (KGs) across different languages, providing users with seamless access to diverse and comprehensive knowledge. Existing methods, mostly supervised, face challenges in obtaining labeled entity pairs. To address this, recent studies have shifted towards self-supervised and unsupervised frameworks. Despite their effectiveness, these approaches have limitations: (1) Relation passing: mainly focusing on the entity while neglecting the semantic information of relations, (2) Isomorphic assumption: assuming isomorphism between source and target graphs, which leads to noise and reduced alignment accuracy, and (3) Noise vulnerability: susceptible to noise in the textual features, especially when encountering inconsistent translations or Out-of-Vocabulary (OOV) problems. In this paper, we propose ERAlign, an unsupervised and robust cross-lingual EA pipeline that jointly performs Entity-level and Relation-level Alignment by neighbor triple matching strategy using semantic textual features of relations and entities. Its refinement step iteratively enhances results by fusing entity-level and relation-level alignments based on neighbor triple matching. The additional verification step examines the entities' neighbor triples as the linearized text. This Align-then-Verify pipeline rigorously assesses alignment results, achieving near-perfect alignment even in the presence of noisy textual features of entities. Our extensive experiments demonstrate that the robustness and general applicability of ERAlign improved the accuracy and effectiveness of EA tasks, contributing significantly to knowledge-oriented applications.
Problem

Research questions and friction points this paper is trying to address.

Cross-lingual Entity Alignment
Unsupervised Learning
Semantic Relational Meaning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised Learning
Entity and Relation Alignment
Noise Robustness