mRAKL: Multilingual Retrieval-Augmented Knowledge Graph Construction for Low-Resourced Languages

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

This work addresses multilingual knowledge graph completion (mKGC) for low-resource languages—specifically Tigrinya and Amharic—by reformulating mKGC as a cross-lingual question answering (QA) task and proposing a retrieval-augmented generation (RAG)-based solution. Methodologically, a BM25 retriever fetches relevant context from English/Arabic knowledge sources, which then conditions a multilingual QA-style generative model to predict missing entities and relations, enabling cross-lingual knowledge transfer. Key contributions include: (i) the first application of the RAG paradigm to low-resource mKGC; (ii) empirical validation that an ideal retriever yields substantial accuracy gains; and (iii) performance improvement without any target-language annotated data. Experiments demonstrate that RAG achieves absolute accuracy improvements of +4.92 and +8.79 percentage points over context-free baselines on Tigrinya and Amharic, respectively—marking significant progress in low-resource knowledge graph construction.

Technology Category

Application Category

📝 Abstract

Knowledge Graphs represent real-world entities and the relationships between them. Multilingual Knowledge Graph Construction (mKGC) refers to the task of automatically constructing or predicting missing entities and links for knowledge graphs in a multilingual setting. In this work, we reformulate the mKGC task as a Question Answering (QA) task and introduce mRAKL: a Retrieval-Augmented Generation (RAG) based system to perform mKGC. We achieve this by using the head entity and linking relation in a question, and having our model predict the tail entity as an answer. Our experiments focus primarily on two low-resourced languages: Tigrinya and Amharic. We experiment with using higher-resourced languages Arabic and English for cross-lingual transfer. With a BM25 retriever, we find that the RAG-based approach improves performance over a no-context setting. Further, our ablation studies show that with an idealized retrieval system, mRAKL improves accuracy by 4.92 and 8.79 percentage points for Tigrinya and Amharic, respectively.

Problem

Research questions and friction points this paper is trying to address.

Constructs multilingual knowledge graphs for low-resourced languages.

Reformulates knowledge graph construction as question answering.

Improves accuracy using retrieval-augmented generation for Tigrinya and Amharic.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates mKGC as QA task

Uses Retrieval-Augmented Generation (RAG)

Focuses on low-resourced languages

🔎 Similar Papers

GrEmLIn: A Repository of Green Baseline Embeddings for 87 Low-Resource Languages Injected with Multilingual Graph Knowledge