🤖 AI Summary
Existing columnar databases lack effective optimization for queries that intertwine relational and array operations. This work proposes A3D-RA, an extended relational algebra that natively supports array attributes, and formally defines its semantics for the first time. Building upon this foundation, we develop a modular, backend-agnostic optimization framework equipped with a complete set of equivalence-preserving transformation rules. The framework enables polynomial-time enumeration of optimal execution plans for non-join operations. Experimental evaluation across three mainstream analytical database engines demonstrates that integrating this optimization layer consistently yields significant performance improvements on real-world workloads.
📝 Abstract
Modern analytical workloads increasingly combine relational data with array-valued attributes. While columnar database systems efficiently process such workloads, their ability to optimize queries that interleave relational operators with array manipulations remains limited. This paper introduces A3D-RA, an extended relational algebra supporting array-valued attributes, together with a comprehensive framework for algebraic reasoning and optimization. We formalize its data model and semantics, develop a complete set of equivalence-preserving transformation rules capturing pairwise interactions between relational and array operators, and propose a plan enumeration strategy with an optimality guarantee that remains polynomial in all non-join operators. We design A3D-RA as a modular, backend-independent optimization layer that can be instantiated over existing analytical database systems. Experimental results across three high-performance engines on a real-world workload show consistent performance gains enabled by the proposed algebraic optimization layer.