AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This paper addresses the challenge of fully automated knowledge graph construction from massive unstructured text without a predefined schema. Methodologically, it introduces a large language model–based framework for joint triplet extraction and dynamic schema induction—achieving, for the first time, zero-human-intervention schema induction at the billion-scale. The approach integrates event-entity joint modeling, hierarchical conceptual clustering, and web-scale distributed text processing. Key contributions include: (1) overcoming the traditional schema dependency bottleneck by generating dynamic schemas with 95% semantic alignment accuracy; (2) simultaneously extracting factual instances and organizing them into conceptual hierarchies; and (3) constructing the ATLAS graph family (900M nodes, 5.9B edges), which outperforms state-of-the-art methods on multi-hop question answering and significantly enhances LLM factual correctness.

Technology Category

Application Category

📝 Abstract

We present AutoSchemaKG, a framework for fully autonomous knowledge graph construction that eliminates the need for predefined schemas. Our system leverages large language models to simultaneously extract knowledge triples and induce comprehensive schemas directly from text, modeling both entities and events while employing conceptualization to organize instances into semantic categories. Processing over 50 million documents, we construct ATLAS (Automated Triple Linking And Schema induction), a family of knowledge graphs with 900+ million nodes and 5.9 billion edges. This approach outperforms state-of-the-art baselines on multi-hop QA tasks and enhances LLM factuality. Notably, our schema induction achieves 95% semantic alignment with human-crafted schemas with zero manual intervention, demonstrating that billion-scale knowledge graphs with dynamically induced schemas can effectively complement parametric knowledge in large language models.

Problem

Research questions and friction points this paper is trying to address.

Autonomous knowledge graph construction without predefined schemas

Dynamic schema induction from large-scale text corpora

Enhancing LLM factuality via billion-scale knowledge graphs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autonomous schema induction from text

Large-scale knowledge graph construction

Dynamic schema alignment with LLMs

🔎 Similar Papers

No similar papers found.