🤖 AI Summary
The data complexity of GQL—the standardized query language for graph databases—has long lacked a rigorous theoretical characterization.
Method: We establish a unified relational-logic framework for graph queries, embedding full GQL (including arithmetic extensions) into FO[TC] + ESO. We introduce the *Restricted Quantifier Collapse* (RQC), a general technique grounded in finite model theory, and employ register automata modeling coupled with schema validation to analyze query evaluation.
Contribution/Results: Our approach yields tight data complexity bounds for GQL and regular path queries (e.g., NL-completeness), and precisely captures regular path queries within FO[TC] while preserving NL complexity. This work provides the first systematic logical foundation for analyzing graph database query complexity, delivering both a unified formal framework and asymptotically optimal, tight complexity characterizations.
📝 Abstract
We study a relational perspective of graph database querying. Such a perspective underlies various graph database systems but very few theoretical investigations have been conducted on it. This perspective offers a powerful and unified framework to study graph database querying, by which algorithms and complexity follow from classical results. We provide two concrete applications. The first is querying property graphs. The property graph data model supersedes previously proposed graph models and underlies the new standard GQL for graph query languages. We show that this standard can be, by and large, expressed by extensions of relational calculus with transitive closure operators (FO[TC]) and existential second-order quantifiers (ESO). With this, we obtain optimal data complexity bounds, along with extensions including schema validation. The second application is incorporating data from concrete domains (e.g., numbers) in graph database querying. We use embedded finite model theory and, by exploiting a generic Restricted Quantifier Collapse (RQC) result for FO[TC] and ESO, we obtain optimal data complexity bounds for GQL with arithmetics and comparisons. Moreover, we show that Regular Data Path Querying with operations on data (i.e. using register automata formalisms) can be captured in FO[TC] over embedded finite graphs while preserving nondeterministic logspace data complexity.