Distributed Linearly Separable Computation with Arbitrary Heterogeneous Data Assignment

📅 2026-01-15

📈 Citations: 0

✨ Influential: 0

career value

263K/year

🤖 AI Summary

This work investigates the fundamental trade-off between computation load and communication overhead for linearly separable computation tasks in a master-worker architecture, where workers possess arbitrarily heterogeneous data distributions. By analyzing the structure of data allocation, the paper proposes the first general-purpose coded computing framework applicable to arbitrary heterogeneity. Matching information-theoretic converse bounds are established under both integer and fractional communication cost models. The proposed scheme overcomes the limitations of prior approaches that rely on homogeneity or specific data assignment assumptions, achieving tight characterization of the optimal computation–communication trade-off in heterogeneous settings across a wide range of parameter regimes.

Technology Category

Application Category

📝 Abstract

Distributed linearly separable computation is a fundamental problem in large-scale distributed systems, requiring the computation of linearly separable functions over different datasets across distributed workers. This paper studies a heterogeneous distributed linearly separable computation problem, including one master and N distributed workers. The linearly separable task function involves Kc linear combinations of K messages, where each message is a function of one dataset. Distinguished from the existing homogeneous settings that assume each worker holds the same number of datasets, where the data assignment is carefully designed and controlled by the data center (e.g., the cyclic assignment), we consider a more general setting with arbitrary heterogeneous data assignment across workers, where `arbitrary'means that the data assignment is given in advance and `heterogeneous'means that the workers may hold different numbers of datasets. Our objective is to characterize the fundamental tradeoff between the computable dimension of the task function and the communication cost under arbitrary heterogeneous data assignment. Under the constraint of integer communication costs, for arbitrary heterogeneous data assignment, we propose a universal computing scheme and a universal converse bound by characterizing the structure of data assignment, where they coincide under some parameter regimes. We then extend the proposed computing scheme and converse bound to the case of fractional communication costs.

Problem

Research questions and friction points this paper is trying to address.

distributed linearly separable computation

heterogeneous data assignment

communication cost

computable dimension

Innovation

Methods, ideas, or system contributions that make the work stand out.

distributed computing

linearly separable computation

heterogeneous data assignment

communication-computation tradeoff