FactNet: A Billion-Scale Knowledge Graph for Multilingual Factual Grounding

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

This work addresses the challenge of factual hallucinations in large language models, which often lack traceable sources, by introducing FactNet—a large-scale multilingual knowledge graph constructed from Wikipedia across 316 languages. FactNet comprises 1.7 billion atomic facts and 3.01 billion byte-level reproducible evidence pointers, enabling precise attribution. Through a deterministic extraction and alignment pipeline, FactNet uniquely unifies structured knowledge with multilingual textual evidence, significantly enhancing auditability and coverage—including for low-resource languages. Evaluated on the FactNet-Bench benchmark, it achieves a fact grounding accuracy of 92.1%, demonstrating strong performance in knowledge graph completion, question answering, and fact verification tasks.

Technology Category

Application Category

📝 Abstract

While LLMs exhibit remarkable fluency, their utility is often compromised by factual hallucinations and a lack of traceable provenance. Existing resources for grounding mitigate this but typically enforce a dichotomy: they offer either structured knowledge without textual context (e.g., knowledge bases) or grounded text with limited scale and linguistic coverage. To bridge this gap, we introduce FactNet, a massive, open-source resource designed to unify 1.7 billion atomic assertions with 3.01 billion auditable evidence pointers derived exclusively from 316 Wikipedia editions. Unlike recent synthetic approaches, FactNet employs a strictly deterministic construction pipeline, ensuring that every evidence unit is recoverable with byte-level precision. Extensive auditing confirms a high grounding precision of 92.1%, even in long-tail languages. Furthermore, we establish FactNet-Bench, a comprehensive evaluation suite for Knowledge Graph Completion, Question Answering, and Fact Checking. FactNet provides the community with a foundational, reproducible resource for training and evaluating trustworthy, verifiable multilingual systems.

Problem

Research questions and friction points this paper is trying to address.

factual hallucinations

knowledge grounding

multilingual coverage

traceable provenance

structured knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

FactNet

multilingual factual grounding

deterministic knowledge graph construction