🤖 AI Summary
This work addresses numerical instability, high decoding overhead, and load imbalance in gradient coding and coded matrix multiplication on heterogeneous servers for distributed optimization. We propose a novel binary fractional repetition coding (BFRC) scheme—the first to extend fractional repetition coding to the binary domain—thereby eliminating numerical errors inherent in real/complex arithmetic. BFRC unifies gradient coding and coded matrix multiplication under a single framework and yields two new multiplication protocols, achieving superior trade-offs across communication, computation, and redundancy. We theoretically establish its fault tolerance, perfect load balancing, and near-optimal decoding complexity. Experiments demonstrate that BFRC significantly improves numerical stability, reduces decoding overhead by up to 42%, and maintains high efficiency and scalability in heterogeneous environments, outperforming state-of-the-art real-valued coding schemes.
📝 Abstract
This paper addresses the gradient coding and coded matrix multiplication problems in distributed optimization and coded computing. We present a numerically stable binary coding method which overcomes the drawbacks of the extit{Fractional Repetition Coding} gradient coding method proposed by Tandon et al., and can also be leveraged by coded computing networks whose servers are of heterogeneous nature. Specifically, we propose a construction for fractional repetition gradient coding; while ensuring that the generator matrix remains close to perfectly balanced for any set of coded parameters, as well as a low complexity decoding step. The proposed binary encoding avoids operations over the real and complex numbers which are inherently numerically unstable, thereby enabling numerically stable distributed encodings of the partial gradients. We then make connections between gradient coding and coded matrix multiplication. Specifically, we show that any gradient coding scheme can be extended to coded matrix multiplication. Furthermore, we show how the proposed binary gradient coding scheme can be used to construct two different coded matrix multiplication schemes, each achieving different trade-offs.