🤖 AI Summary
To address the failure and poor scalability of conventional density-based clustering methods on high-dimensional data with significant local density variations, this paper proposes a scalable variable-density clustering framework. The method constructs a density-adaptive approximate neighborhood graph—built via random projection—and formulates clustering as a density-aware label propagation process over this graph, implicitly enforcing intra-cluster density consistency through graph connectivity. By bypassing explicit density estimation and mitigating parameter sensitivity, the approach enables sublinear-time approximate nearest neighbor construction and efficient graph diffusion. Evaluated on million-scale datasets, it completes clustering in minutes while matching state-of-the-art accuracy, yet with substantially reduced computational overhead. The framework thus offers both theoretical soundness—rooted in density-aware graph modeling—and practical applicability for large-scale, high-dimensional clustering tasks.
📝 Abstract
We propose a novel perspective on varied-density clustering for high-dimensional data by framing it as a label propagation process in neighborhood graphs that adapt to local density variations. Our method formally connects density-based clustering with graph connectivity, enabling the use of efficient graph propagation techniques developed in network science. To ensure scalability, we introduce a density-aware neighborhood propagation algorithm and leverage advanced random projection methods to construct approximate neighborhood graphs. Our approach significantly reduces computational cost while preserving clustering quality. Empirically, it scales to datasets with millions of points in minutes and achieves competitive accuracy compared to existing baselines.