🤖 AI Summary
In modern data analytics, entities and their attribute relationships often span multiple granularities, complicating critical attribute derivation and target entity retrieval. Existing OLAP operators, window functions, and aggregation constructs suffer from limitations in composability, formal expressiveness, and runtime performance. To address this, we propose Multi-Relational Algebra (MRA), a novel algebraic framework grounded in the “slice”—a semantic unit comprising a region of tuples and an associated feature table—thereby transcending conventional single-table or single-column constraints. MRA supports dynamic heterogeneous schema modeling and cross-schema composition, and introduces a formal algebraic system, a slice-based computational model, a unified logical execution engine, and a query optimization framework tailored for data insight discovery. The system has been deployed in production, supporting millions of daily operations and effectively handling complex analytical tasks that resist modeling under traditional relational paradigms.
📝 Abstract
A range of data insight analytical tasks involves analyzing a large set of tables of different schemas, possibly induced by various groupings, to find salient patterns. This paper presents Multi-Relational Algebra, an extension of the classic Relational Algebra, to facilitate such transformations and their compositions. Multi-Relational Algebra has two main characteristics: (1) Information Unit. The information unit is a slice $(r, X)$, where $r$ is a (region) tuple, and $X$ is a (feature) table. Specifically, a slice can encompass multiple columns, which surpasses the information unit of"a single tuple"or"a group of tuples of one column"in the classic relational algebra, (2) Schema Flexibility. Slices can have varying schemas, not constrained to a single schema. This flexibility further expands the expressive power of the algebra. Through various examples, we show that multi-relational algebra can effortlessly express many complex analytic problems, some of which are beyond the scope of traditional relational analytics. We have implemented and deployed a service for multi-relational analytics. Due to a unified logical design, we are able to conduct systematic optimization for a variety of seemingly different tasks. Our service has garnered interest from numerous internal teams who have developed data-insight applications using it, and serves millions of operators daily.