A clusterability test for directed graphs

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of assessing clusterability in directed graphs without self-loops. We propose a directed-graph extension of the δ-test, defining directed neighborhood density and constructing a local-to-global density ratio as the test statistic; computational efficiency is achieved via neighborhood sampling. Theoretically and empirically, the method achieves high-accuracy detection of non-clusterable structures using only ≈1% of nodes as samples. It exhibits strong robustness under sparsity, noise, and mild violations of modeling assumptions—significantly outperforming conventional approaches based on undirected conversion or global graph statistics. To our knowledge, this is the first systematic generalization of the δ-test to directed graphs, providing a scalable, statistically principled tool for pre-clustering assessment in large-scale directed networks.

Technology Category

Application Category

📝 Abstract
In this article, we extend a statistical test of graph clusterability, the $δ$ test, to directed graphs with no self loops. The $δ$ test, originally designed for undirected graphs, is based on the premise that graphs with a clustered structure display a mean local density that is statistically higher than the graph's global density. We posit that graphs that do not meet this necessary (but not sufficient) condition for clusterability can be considered unsuited to clustering. In such cases, vertex clusters do not offer a meaningful summary of the broader graph. Additionally in this study, we aim to determine the optimal sample size (number of neighborhoods). Our test, designed for the analysis of large networks, is based on sampling subsets of neighborhoods/nodes. It is designed for cases where computing the density of every node's neighborhood is infeasible. Our results show that the $δ$ test performs very well, even with very small samples of neighborhoods ($1%$). It accurately detects unclusterable graphs and is also shown to be robust to departures from the underlying assumptions of the $t$ test.
Problem

Research questions and friction points this paper is trying to address.

Extends clusterability test to directed graphs
Determines optimal sample size for neighborhoods
Tests performance with small neighborhood samples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends δ test to directed graphs
Uses sampling for large networks
Detects unclusterable graphs accurately
M
Mario R. Guarracino
Università degli Studi di Cassino e del Lazio Meridionale, Cassino, Italy
P
Pierre Miasnikof
Université Laval, Québec, QC, Canada
A
Alexander Y. Shestopaloff
Queen Mary University of London, London, United Kingdom; Memorial University of Newfoundland, St-John’s, NL, Canada
H
Houyem Demni
Università degli Studi di Cassino e del Lazio Meridionale, Cassino, Italy
Cristián Bravo
Cristián Bravo
Professor and Canada Research Chair, Western University
Credit ScoringCredit RiskFintechBusiness AnalyticsData Science
Y
Yuri Lawryshyn
University of Toronto, Toronto, ON, Canada