π€ AI Summary
This paper addresses communication cost optimization for full join queries in heterogeneous distributed systems: minimizing the maximum per-machine communication cost across $p$ machines with disparate capabilities, where each machineβs received data volume serves as a variable and heterogeneous cost functions constitute constraints. It introduces the first general cost-function model for heterogeneous environments, departing from conventional homogeneous assumptions. Under single-round communication, tight upper and lower bounds are established for star, triangle, Cartesian product, and general join queries. The approach integrates an extended HyperCube framework, load distribution theory, information-theoretic lower bound analysis, and a customized hash-partitioning strategy. For equal-cardinality databases, the solution achieves optimal cost matching; for unequal-cardinality cases, all target queries attain theoretically tight bounds.
π Abstract
We study the problem of computing a full Conjunctive Query in parallel using $p$ heterogeneous machines. Our computational model is similar to the MPC model, but each machine has its own cost function mapping from the number of bits it receives to a cost. An optimal algorithm should minimize the maximum cost across all machines. We consider algorithms over a single communication round and give a lower bound and matching upper bound for databases where each relation has the same cardinality. We do this for both linear cost functions like in previous work, but also for more general cost functions. For databases with relations of different cardinalities, we also find a lower bound, and give matching upper bounds for specific queries like the cartesian product, the join, the star query, and the triangle query. Our approach is inspired by the HyperCube algorithm, but there are additional challenges involved when machines have heterogeneous cost functions.