🤖 AI Summary
Recursive SQL queries suffer from ambiguous standard semantics and inconsistent vendor implementations, leading to critical safety hazards—including non-termination, incorrect results, and system crashes. To address these issues, this paper introduces the first recursive query calculus enabling automatic derivation of mathematical properties, and proposes TyQL—a type-driven language designed for formal verification and cross-database portability. Implemented in Scala, TyQL leverages named tuples and type-level pattern matching to integrate formal semantic verification with backend adaptation mechanisms. Crucially, the approach incurs zero runtime overhead: performance benchmarks demonstrate a 1000× speedup over native recursive SQL, achieving throughput comparable to non-recursive SQL. Moreover, TyQL provably eliminates all three fundamental recursive defects—non-termination, logical errors, and execution crashes—ensuring soundness, correctness, and robustness across diverse database systems.
📝 Abstract
Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, rely on fixed-point computations. The introduction of recursive common table expressions (CTEs) using the WITH RECURSIVE keyword in SQL:1999 extended the ability of relational database systems to handle fixed-point computations, unlocking significant performance advantages by allowing computation to move closer to the data. Yet with recursion, SQL becomes a Turing-complete programming language and, with that, unrecoverable safety and correctness risks. SQL itself lacks a fixed semantics, as the SQL specification is written in natural language, full of ambiguities that database vendors resolve in divergent ways. As a result, reasoning about the correctness of recursive SQL programs must rely on isolated mathematical properties of queries rather than wrestling a unified formal model out of a language with notoriously inconsistent semantics. To address these challenges, we propose a calculus that automatically derives mathematical properties from embedded recursive queries and, depending on the database backend, rejects queries that may lead to the three classes of recursive query errors - database errors, incorrect results, and non-termination. We introduce TyQL, a practical implementation in Scala for safe, recursive language-integrated query. Using Named-Tuples and type-level pattern matching, TyQL ensures query portability and safety, showing no performance penalty compared to raw SQL strings while unlocking a three-orders-of-magnitude speedup over non-recursive SQL queries.