🤖 AI Summary
This paper addresses the lack of semantic unification and cross-paradigm (RDF vs. property graphs) and cross-community (Semantic Web vs. database) comparability among three mainstream graph schema languages—SHACL, ShEx, and PG-Schema. To this end, we propose the first unified formal semantic framework that systematically characterizes their node constraints, edge constraints, recursive expressions, and validation semantics—including coverage relations among them. Our methodology integrates formal language-theoretic modeling, precise semantic definitions of graph constraints, cross-language functional mapping, and equivalence analysis. The results reveal both shared expressive capabilities and design divergences across the languages. Our key contribution is the first comparability theory model supporting both RDF and property graph paradigms, enabling rigorous standardization of graph data validation, interoperability of validation tools, and principled integration of schema languages. This work establishes a foundational theoretical basis for advancing graph data governance and schema language engineering.
📝 Abstract
Graphs have emerged as an important foundation for a variety of applications, including capturing and reasoning over factual knowledge, semantic data integration, social networks, and providing factual knowledge for machine learning algorithms. To formalise certain properties of the data and to ensure data quality, there is a need to describe the schema of such graphs. Because of the breadth of applications and availability of different data models, such as RDF and property graphs, both the Semantic Web and the database community have independently developed graph schema languages: SHACL, ShEx, and PG-Schema. Each language has its unique approach to defining constraints and validating graph data, leaving potential users in the dark about their commonalities and differences. In this paper, we provide formal, concise definitions of the core components of each of these schema languages. We employ a uniform framework to facilitate a comprehensive comparison between the languages and identify a common set of functionalities, shedding light on both overlapping and distinctive features of the three languages.