ZCCL: Significantly Improving Collective Communication With Error-Bounded Lossy Compression

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance bottlenecks in large-scale MPI collective communication caused by large-message transfers, this paper introduces ZCCL—the first error-bounded lossy compression framework specifically designed for collective operations. Its core contributions are: (1) the fZ-light compressor—a ultra-lightweight, high-throughput design ensuring strict, user-controllable error bounds; (2) the first systematic integration of bounded-loss compression into mainstream collectives (e.g., Allgather, Allreduce), enabling end-to-end provable and tunable error guarantees; and (3) a compression-communication co-optimization mechanism tailored to both data-movement and computation-intensive collective primitives. Evaluated on real scientific datasets, ZCCL achieves 1.9×–8.9× speedup over native MPI, significantly reduces communication volume, and rigorously satisfies user-specified absolute or relative error thresholds.

Technology Category

Application Category

📝 Abstract
With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communication turns out to be a critical bottleneck in large-scale distributed and parallel processing. The large message size in MPI collectives is particularly concerning because it can significantly degrade overall parallel performance. To address this issue, prior research simply applies off-the-shelf fixed-rate lossy compressors in the MPI collectives, leading to suboptimal performance, limited generalizability, and unbounded errors. In this paper, we propose a novel solution, called ZCCL, which leverages error-bounded lossy compression to significantly reduce the message size, resulting in a substantial reduction in communication costs. The key contributions are three-fold. (1) We develop two general, optimized lossy-compression-based frameworks for both types of MPI collectives (collective data movement as well as collective computation), based on their particular characteristics. Our framework not only reduces communication costs but also preserves data accuracy. (2) We customize fZ-light, an ultra-fast error-bounded lossy compressor, to meet the specific needs of collective communication. (3) We integrate ZCCL into multiple collectives, such as Allgather, Allreduce, Scatter, and Broadcast, and perform a comprehensive evaluation based on real-world scientific application datasets. Experiments show that our solution outperforms the original MPI collectives as well as multiple baselines by 1.9--8.9X.
Problem

Research questions and friction points this paper is trying to address.

Reduces MPI collective communication costs
Improves data accuracy with error-bounded compression
Enhances performance in large-scale scientific applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Error-bounded lossy compression
Customized fZ-light compressor
Integration into MPI collectives
🔎 Similar Papers
No similar papers found.
J
Jiajun Huang
University of California, Riverside, CA 92521
Sheng Di
Sheng Di
Argonne National Labratory, IEEE Senior Member
HPCData CompressionResilienceCloud/Grid Computing/P2PFederated Learning
X
Xiaodong Yu
Stevens Institute of Technology, Hoboken, NJ 07030
Y
Yujia Zhai
University of California, Riverside, CA 92521
Zhaorui Zhang
Zhaorui Zhang
The Hong Kong Polytechnic University, Department of Computing
LLM and MLSysHPCDistributed & Parallel SystemCloud ComputingFPGA
J
Jinyang Liu
University of Houston, Houston, TX 77204
Xiaoyi Lu
Xiaoyi Lu
Associate Professor, University of California, Merced
Big DataHigh Performance ComputingCloud ComputingDeep LearningDistributed Computing
Ken Raffenetti
Ken Raffenetti
Argonne National Laboratory
High Performance Computing
H
Hui Zhou
Argonne National Laboratory, Lemont, IL 60439
K
Kai Zhao
Florida State University, Tallahassee, FL 32306
K
Khalid Alharthi
Department Of Computer Science, College of Computing And Information Technology, University Of Bisha, Bisha 61922, P.O. Box 551, Saudi Arabia
Z
Zizhong Chen
University of California, Riverside, CA 92521
Franck Cappello
Franck Cappello
Argonne National Laboratory, IEEE Fellow
Parallel ProcessingParallel ComputingHigh Performance ComputingFault ToleranceData Compression
Yanfei Guo
Yanfei Guo
Argonne National Laboratory
Programming Model and Runtime SystemsHPC SystemsCloud ComputingMapReduce and Big Data ProcessingAutonomic Computing in V
R
R. Thakur
Argonne National Laboratory, Lemont, IL 60439