🤖 AI Summary
This work addresses the inefficiency faced by data analysts who must repeatedly submit and integrate multiple related queries to explore salient data patterns. To streamline this process, the paper introduces the ANALYZE operator, which formalizes such exploratory analysis as five auxiliary cube queries, enabling comprehensive 360-degree examination of specific data subsets. Leveraging multi-query optimization (MQO), the authors devise three query merging and execution strategies—Mid-MQO, Min-MQO, and Max-MQO—that significantly improve execution efficiency while preserving result equivalence. Experimental evaluation demonstrates that Mid-MQO consistently delivers the best overall performance across most scenarios, whereas Max-MQO excels when sibling queries are numerous and exhibit high overlap.
📝 Abstract
In their hunt for highlights, i.e., interesting patterns in the data, data analysts have to issue groups of related queries and manually combine their results. To the extent that the analyst's goals are based on an intention on what to discover (e.g., contrast a query result to peer ones, verify a pattern to a broader range of data in the data space, etc), the integration of intentional query operators in analytical engines can enhance the efficiency of these analytical tasks. In this paper, we introduce, with well-defined semantics, the ANALYZE operator, a novel cube querying intentional operator that provides a 360 view of data. We define the semantics of an ANALYZE query as a tuple of five internal, facilitator cube queries, that (a) report on the specifics of a particular subset of the data space, which is part of the query specification, and to which we refer as the original query, (b) contrast the result with results from peer-subspaces, or sibling queries, and, (c) explore the data space in lower levels of granularity via drill-down queries. We introduce formal query semantics for the operator and we theoretically prove that we can obtain the exact same result by merging the facilitator cube queries into a smaller number of queries. This effectively introduces a multi-query optimization (MQO) strategy for executing an ANALYZE query. We propose three alternative algorithms, (a) a simple execution without optimizations (Min-MQO), (b) a total merging of all the facilitator queries to a single one (Max-MQO), and (c) an intermediate strategy, Mid-MQO, that merges only a subset of the facilitator queries. Our experimentation demonstrates that Mid-MQO achieves consistently strong performance across several contexts, Min-MQO always follows it, and Max-MQO excels for queries where the siblings are sizable and significantly overlap.