Scalable Varied-Density Clustering via Graph Propagation

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the failure and poor scalability of conventional density-based clustering methods on high-dimensional data with significant local density variations, this paper proposes a scalable variable-density clustering framework. The method constructs a density-adaptive approximate neighborhood graph—built via random projection—and formulates clustering as a density-aware label propagation process over this graph, implicitly enforcing intra-cluster density consistency through graph connectivity. By bypassing explicit density estimation and mitigating parameter sensitivity, the approach enables sublinear-time approximate nearest neighbor construction and efficient graph diffusion. Evaluated on million-scale datasets, it completes clustering in minutes while matching state-of-the-art accuracy, yet with substantially reduced computational overhead. The framework thus offers both theoretical soundness—rooted in density-aware graph modeling—and practical applicability for large-scale, high-dimensional clustering tasks.

Technology Category

Application Category

📝 Abstract
We propose a novel perspective on varied-density clustering for high-dimensional data by framing it as a label propagation process in neighborhood graphs that adapt to local density variations. Our method formally connects density-based clustering with graph connectivity, enabling the use of efficient graph propagation techniques developed in network science. To ensure scalability, we introduce a density-aware neighborhood propagation algorithm and leverage advanced random projection methods to construct approximate neighborhood graphs. Our approach significantly reduces computational cost while preserving clustering quality. Empirically, it scales to datasets with millions of points in minutes and achieves competitive accuracy compared to existing baselines.
Problem

Research questions and friction points this paper is trying to address.

Clustering high-dimensional data with varying densities
Connecting density-based clustering with graph connectivity
Scaling to large datasets efficiently without losing accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Label propagation in adaptive neighborhood graphs
Density-aware neighborhood propagation algorithm
Approximate graphs via random projection methods
🔎 Similar Papers
No similar papers found.
Ninh Pham
Ninh Pham
Senior Lecturer at University of Auckland
Data MiningSimilarity SearchRandomized Algorithms
Y
Yingtao Zheng
University of Auckland, Auckland, New Zealand
H
Hugo Phibbs
University of Auckland, Auckland, New Zealand