🤖 AI Summary
This work addresses the lack of systematic comparison and design principles among existing graph database query languages that employ various heuristic semantics to return finite paths for regular path queries (RPQs). It proposes the first formal framework that models RPQ semantics as a mapping from a graph database and a query to a finite set of paths, and introduces a set of desirable semantic properties that such mappings should satisfy. Through a systematic analysis of the compatibility and feasibility of these properties, the study reveals inherent limitations in several mainstream RPQ semantics, demonstrating that certain properties are mutually exclusive or cannot be simultaneously fulfilled. Building on these insights, the paper presents multiple novel RPQ semantics grounded in rigorous theoretical foundations, offering principled guidance for the future design of graph query languages.
📝 Abstract
Modern property graph database query languages such as Cypher, PGQL, GSQL, and the standard GQL draw inspiration from the formalism of regular path queries (RPQs). In order to output walks explicitly, they depart from the classical and well-studied homomorphism semantics. However, it then becomes difficult to present results to users because RPQs may match infinitely many walks. The aforementioned languages use ad-hoc criteria to select a finite subset of those matches. For instance, Cypher uses trail semantics, discarding walks with repeated edges; PGQL and GSQL use shortest walk semantics, retaining only the walks of minimal length among all matched walks; and GQL allows users to choose from several semantics. Even though there is academic research on these semantics, it focuses almost exclusively on evaluation efficiency. In an attempt to better understand, choose and design RPQ semantics, we present a framework to categorize and compare them according to other criteria. We formalize several possible properties, pertaining to the study of RPQ semantics seen as mathematical functions mapping a database and a query to a finite set of walks. We show that some properties are mutually exclusive, or cannot be met. We also give several new RPQ semantics as examples. Some of them may provide ideas for the design of new semantics for future graph database query languages.