RDFGraphGen: A Synthetic RDF Graph Generator based on SHACL Constraints

📅 2024-07-25
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of domain-specific, semantically compliant real-world knowledge graphs for RDF application development, this paper proposes the first method for reverse-generating synthetic RDF graphs from SHACL constraints. Unlike conventional SHACL usage for validation, our approach automatically parses SHACL shape definitions into executable data generation rules. Integrated with user-specified constraints—including scale, entity types, and property restrictions—it produces syntactically correct and semantically valid RDF graphs of customizable size, from small to large scale. Implemented in Python and open-sourced, the tool enables reproducible, controllable synthetic knowledge graph generation—a capability previously unavailable. Experimental evaluation demonstrates that generated RDF datasets effectively support benchmarking, system verification, and semantic web model training, significantly improving development and evaluation efficiency.

Technology Category

Application Category

📝 Abstract
This paper introduces RDFGraphGen, a general-purpose, domain-independent generator of synthetic RDF graphs based on SHACL constraints. The Shapes Constraint Language (SHACL) is a W3C standard which specifies ways to validate data in RDF graphs, by defining constraining shapes. However, even though the main purpose of SHACL is validation of existing RDF data, in order to solve the problem with the lack of available RDF datasets in multiple RDF-based application development processes, we envisioned and implemented a reverse role for SHACL: we use SHACL shape definitions as a starting point to generate synthetic data for an RDF graph. The generation process involves extracting the constraints from the SHACL shapes, converting the specified constraints into rules, and then generating artificial data for a predefined number of RDF entities, based on these rules. The purpose of RDFGraphGen is the generation of small, medium or large RDF knowledge graphs for the purpose of benchmarking, testing, quality control, training and other similar purposes for applications from the RDF, Linked Data and Semantic Web domain. RDFGraphGen is open-source and is available as a ready-to-use Python package.
Problem

Research questions and friction points this paper is trying to address.

Generates synthetic RDF graphs using SHACL shapes
Addresses lack of domain-specific RDF datasets
Provides configurable and scalable RDF graph generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses SHACL shapes for RDF graph generation
Domain-agnostic with configurable constraints
Includes predefined schema.org values
🔎 Similar Papers
No similar papers found.
M
Marija Vecovska
Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, N. Macedonia
Milos Jovanovik
Milos Jovanovik
Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, N. Macedonia