Schema Generation for Large Knowledge Graphs Using Large Language Models

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

Traditional knowledge graph (KG) schema construction heavily relies on manual curation by domain experts, limiting scalability and maintainability. Method: We propose the first automated KG schema generation method targeting the Shape Expressions (ShEx) formal language, leveraging large language models (LLMs) in a multi-stage pipeline that jointly incorporates local structural patterns and global semantic context from KGs. Contribution/Results: To support rigorous evaluation, we introduce two benchmark datasets—YAGO Schema and Wikidata EntitySchema—and define dedicated metrics for ShEx schema quality. Experiments across multiple large-scale KGs demonstrate that our approach generates highly accurate, formally verifiable ShEx schemas, significantly improving automation and scalability. This work advances the paradigm shift from manual to LLM-driven KG schema engineering and establishes a novel benchmark and methodology for applying LLMs to syntactically strict, formal specification languages.

Technology Category

Application Category

📝 Abstract

Schemas are vital for ensuring data quality in the Semantic Web and natural language processing. Traditionally, their creation demands substantial involvement from knowledge engineers and domain experts. Leveraging the impressive capabilities of large language models (LLMs) in related tasks like ontology engineering, we explore automatic schema generation using LLMs. To bridge the resource gap, we introduce two datasets: YAGO Schema and Wikidata EntitySchema, along with evaluation metrics. The LLM-based pipelines effectively utilize local and global information from knowledge graphs (KGs) to generate validating schemas in Shape Expressions (ShEx). Experiments demonstrate LLMs' strong potential in producing high-quality ShEx schemas, paving the way for scalable, automated schema generation for large KGs. Furthermore, our benchmark introduces a new challenge for structured generation, pushing the limits of LLMs on syntactically rich formalisms.

Problem

Research questions and friction points this paper is trying to address.

Automating schema generation for large knowledge graphs

Reducing reliance on manual expert involvement

Evaluating LLMs for scalable ShEx schema production

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs automate schema generation for knowledge graphs

Introduces YAGO and Wikidata datasets for evaluation

Generates ShEx schemas using local and global KG info

🔎 Similar Papers

No similar papers found.