๐ค AI Summary
Addressing the challenge of efficiently approximating eight types of temporal triangles in massive temporal graphs (millions of nodes, billions of temporal edges), this paper proposes the first streaming algorithm with sublinear memory consumption and unbiased, low-variance estimation. Our method innovatively integrates edge-level triangle count prediction with lightweight sampling, forming a prediction-enhanced sampling framework. It is the first to provide a unified theoretical model for all eight temporal triangle types, accompanied by rigorous error bounds. Evaluated on billion-edge datasets, our approach achieves significantly lower estimation error than state-of-the-art methods, reduces memory usage by one to two orders of magnitude, and improves throughput by up to several hundred timesโthereby satisfying requirements for high accuracy, low latency, and resource-constrained deployment.
๐ Abstract
Triangle counting is a fundamental and widely studied problem on static graphs, and recently on temporal graphs, where edges carry information on the timings of the associated events. Streaming processing and resource efficiency are crucial requirements for counting triangles in modern massive temporal graphs, with millions of nodes and up to billions of temporal edges. However, current exact and approximate algorithms are unable to handle large-scale temporal graphs. To fill such a gap, we introduce STEP, a scalable and efficient algorithm to approximate temporal triangle counts from a stream of temporal edges. STEP combines predictions to the number of triangles a temporal edge is involved in, with a simple sampling strategy, leading to scalability, efficiency, and accurate approximation of all eight temporal triangle types simultaneously. We analytically prove that, by using a sublinear amount of memory, STEP obtains unbiased and very accurate estimates. In fact, even noisy predictions can significantly reduce the variance of STEP's estimates. Our extensive experiments on massive temporal graphs with up to billions of edges demonstrate that STEP outputs high-quality estimates and is more efficient than state-of-the-art methods.