A Novel Approach to Translate Structural Aggregation Queries to MapReduce Code

📅 2025-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficiently mapping structured array aggregations—such as circular, grid, hierarchical, and sliding-window aggregations—in array databases (e.g., SciDB) onto MapReduce remains challenging due to semantic mismatches between array-centric operations and key-value abstractions. Method: This paper introduces the first array-semantic-aware AQL-to-MapReduce automatic translation framework. It models array-specific structural aggregations as rewriteable semantic rules, integrates aggregation-aware data partitioning, and generates multi-stage MapReduce jobs—fully supporting user-defined aggregation functions without modifying the underlying system. Contribution/Results: Experimental evaluation shows that the generated code achieves up to 10.84× speedup over manual MapReduce implementations, while guaranteeing semantic correctness and drastically reducing development complexity. The framework bridges a critical gap between high-level array query languages and distributed compilation optimizations, enabling scalable, declarative array analytics on MapReduce infrastructures.

Technology Category

Application Category

📝 Abstract
Data management applications are growing and require more attention, especially in the"big data"era. Thus, supporting such applications with novel and efficient algorithms that achieve higher performance is critical. Array database management systems are one way to support these applications by dealing with data represented in n-dimensional data structures. For instance, software like SciDB and RasDaMan can be powerful tools to achieve the required performance on large-scale problems with multidimensional data. Like their relational counterparts, these management systems support specific array query languages as the user interface. As a popular programming model, MapReduce allows large-scale data analysis, facilitates query processing, and is used as a DB engine. Nevertheless, one major obstacle is the low productivity of developing MapReduce applications. Unlike high-level declarative languages such as SQL, MapReduce jobs are written in a low-level descriptive language, often requiring massive programming efforts and complicated debugging processes. This work presents a system that supports translating array queries expressed in the Array Query Language (AQL) in SciDB into MapReduce jobs. We focus on translating some unique structural aggregations, including circular, grid, hierarchical, and sliding aggregations. Unlike traditional aggregations in relational DBs, these structural aggregations are designed explicitly for array manipulation. Thus, our work can be considered an array-view counterpart of existing SQL to MapReduce translators like HiveQL and YSmart. Our translator supports structural aggregations over arrays to meet various array manipulations. The translator can also help user-defined aggregation functions with minimal user effort. We show that our translator can generate optimized MapReduce code, which performs better than the short handwritten code by up to 10.84x.
Problem

Research questions and friction points this paper is trying to address.

Complex Structural Data Queries
MapReduce
Big Data Acceleration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Complex Array Operations
MapReduce Optimization
Customizable Operations Support
🔎 Similar Papers
No similar papers found.
A
Ahmed M. Abdelmoniem
Department of Computer Science, Faculty of Computers and Information, Assiut University
Sameh Abdulah
Sameh Abdulah
Senior Research Scientist
High Performance ComputingStatistical ComputingLarge-scale Computing
W
Walid Atwa
Department of Computer Science, Faculty of Computers and Information, Menoufia University