🤖 AI Summary
This paper addresses the lack of a unified query standard for JSON document databases by proposing, for the first time, a formally defined standardized fragment grounded in MongoDB’s aggregation framework. Methodologically, it employs formal semantics to precisely specify the fragment’s syntax and operational semantics, systematically establishes its correspondence with relational algebra, and derives sound algebraic transformation rules for equivalence-based optimization. The key contributions are: (1) establishing the first verifiable formal foundation for JSON document querying; (2) rigorously characterizing the theoretical correspondence between aggregation pipelines and relational operations, thereby enabling semantic interoperability between NoSQL and relational models; and (3) providing an extensible standard framework accompanied by practical, implementable optimization rules that support query rewriting, cost estimation, and cross-engine interoperability. The work bridges theoretical rigor and engineering utility.
📝 Abstract
In this technical report, we present a formalisation of the MongoDB aggregation framework. Our aim is to identify a fragment that could serve as the starting point for an industry-wide standard for querying JSON document databases. We provide a syntax and formal semantics for a set of selected operators, We show how this fragment relates to known relational query languages. We explain how our semantics differs from the current implementation of MongoDB, and justify our choices. We provide a set of algebraic transformations that can be used for query optimisation.