🤖 AI Summary
Real-time triangle counting in fully dynamic graph streams—characterized by frequent edge insertions and deletions and absence of prior knowledge about graph size—remains highly challenging.
Method: This paper proposes the Distributed Triangle Counting (DTC) algorithm family, the first to achieve unbiased approximation without requiring graph-size priors. DTC innovatively integrates randomized pair sampling with a future-edge compensation mechanism to uniformly handle both insertions and deletions; it further employs single-pass streaming processing, distributed hash partitioning, and multi-machine cooperative dynamic updates.
Contribution/Results: DTC achieves linear scalability and low storage overhead (O(1) space complexity). Experiments show that DTC-AR improves estimation accuracy by 2029.4× over baseline methods, while DTC-FD reduces relative error by 32.5×. Both storage efficiency and scalability attain state-of-the-art performance.
📝 Abstract
Triangle counting is a fundamental problem in graph mining, essential for analyzing graph streams with arbitrary edge orders. However, exact counting becomes impractical due to the massive size of real-world graph streams. To address this, approximate algorithms have been developed, but existing distributed streaming algorithms lack adaptability and struggle with edge deletions. In this article, we propose DTC, a novel family of single-pass distributed streaming algorithms for global and local triangle counting in fully dynamic graph streams. Our DTC-AR algorithm accurately estimates triangle counts without prior knowledge of graph size, leveraging multi-machine resources. Additionally, we introduce DTC-FD, an algorithm tailored for fully dynamic graph streams, incorporating edge insertions and deletions. Using Random Pairing and future edge insertion compensation, DTC-FD achieves unbiased and accurate approximations across multiple machines. Experimental results demonstrate significant improvements over baselines. DTC-AR achieves up to $2029.4 imes$ and $27.1 imes$ more accuracy, while maintaining the best trade-off between accuracy and storage space. DTC-FD reduces estimation errors by up to $32.5 imes$ and $19.3 imes$, scaling linearly with graph stream size. These findings highlight the effectiveness of our proposed algorithms in tackling triangle counting in real-world scenarios. The source code and datasets are released and available at href{https://github.com/wayne4s/srds-dtc.git}{https://github.com/wayne4s/srds-dtc.git}.