On Optimizing Resource Utilization in Distributed Connected Components

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low memory and network bandwidth utilization and limited scalability in distributed connected components (CC) computation, this paper proposes two novel distributed union-find algorithms—SiskinCC and RobinCC. Building upon the Jayanti–Tarjan theoretical framework, both algorithms introduce a shared-array memory layout to enhance cache locality and memory access efficiency; further, they incorporate a cross-machine communication compression mechanism coupled with graph-structure-aware scheduling—leveraging real-world graphs’ degree distributions and clustering properties—to significantly reduce communication overhead. Evaluated on a real-world hyper-scale graph with 500 billion edges and 11.7 billion vertices, the algorithms achieve up to 58.5× speedup over state-of-the-art distributed CC methods (e.g., DCI and distributed Ligra+) on a 2048-core CPU system, while reducing memory footprint by 37% and network bandwidth consumption by 52%.

Technology Category

Application Category

📝 Abstract
Connected Components (CC) is a core graph problem with numerous applications. This paper investigates accelerating distributed CC by optimizing memory and network bandwidth utilization. We present two novel distributed CC algorithms, SiskinCC and RobinCC, which are built upon the Jayanti-Tarjan disjoint set union algorithm. To optimize memory utilization, SiskinCC and RobinCC are designed to facilitate efficient access to a shared array for all cores running in a machine. This allows execution of faster algorithms with larger memory bounds. SiskinCC leverages the continuous inter-machine communication during the computation phase to reduce the final communication overhead and RobinCC leverages the structural properties of real-world graphs to optimize network bandwidth utilization. Our evaluation against state-of-the-art CC algorithms, using real-world and synthetic graphs with up to 500 billion edges and 11.7 billion vertices, and on up to 2048 CPU cores, demonstrates that SiskinCC and RobinCC achieve up to 58.5 times speedup.
Problem

Research questions and friction points this paper is trying to address.

Optimizing memory utilization in distributed connected components algorithms
Reducing network bandwidth usage for large-scale graph processing
Accelerating distributed CC computation on massive real-world graphs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes memory via shared array access
Reduces communication via continuous inter-machine exchange
Leverages graph structure for bandwidth efficiency
🔎 Similar Papers
No similar papers found.