🤖 AI Summary
This paper addresses the problem of efficiently estimating the average local triadic coefficient—a measure of node-group-level clustering—in large-scale graphs. To overcome the computational intractability of exact triangle enumeration, we propose Triad, the first unbiased probabilistic estimator specifically designed for this metric. Our method integrates an adaptive edge-sampling strategy with a theoretically optimal unbiased estimator and derives, for the first time, a tight upper bound on its sample complexity. Unlike existing approaches, Triad enables fine-grained structural analysis while achieving sublinear time complexity and high estimation accuracy. Experiments across multiple real-world networks demonstrate that Triad attains estimation errors below 2% and runs one to two orders of magnitude faster than state-of-the-art baselines. A case study further validates that the average local triadic coefficient effectively uncovers attribute-driven higher-order clustering patterns in collaboration networks.
📝 Abstract
Characterizing graph properties is fundamental to the analysis and to our understanding of real-world networked systems. The local clustering coefficient, and the more recently introduced, local closure coefficient, capture powerful properties that are essential in a large number of applications, ranging from graph embeddings to graph partitioning. Such coefficients capture the local density of the neighborhood of each node, considering incident triadic structures and paths of length two. For this reason, we refer to these coefficients collectively as local triadic coefficients.
In this work, we consider the novel problem of computing efficiently the average of local triadic coefficients, over a given partition of the nodes of the input graph into a set of disjoint buckets. The average local triadic coefficients of the nodes in each bucket provide a better insight into the interplay of graph structure and the properties of the nodes associated to each bucket. Unfortunately, exact computation, which requires listing all triangles in a graph, is infeasible for large networks. Hence, we focus on obtaining highly-accurate probabilistic estimates.
We develop Triad, an adaptive algorithm based on sampling, which can be used to estimate the average local triadic coefficients for a partition of the nodes into buckets. Triad is based on a new class of unbiased estimators, and non-trivial bounds on its sample complexity, enabling the efficient computation of highly accurate estimates. Finally, we show how Triad can be efficiently used in practice on large networks, and we present a case study showing that average local triadic coefficients can capture high-order patterns over collaboration networks.