Parallel Hierarchical Agglomerative Clustering in Low Dimensions

📅 2025-07-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of efficient parallel hierarchical agglomerative clustering (HAC) in low-dimensional Euclidean spaces under non-monotonic linkage functions—such as centroid and Ward distance—for which no NC-level algorithm was previously known. To overcome this, we propose the first (1+ε)-approximate parallel HAC framework applicable to non-monotonic linkages: leveraging geometric structural analysis, we prove that cluster tree height is polylogarithmic in low dimensions; based on this, we design a parallel merging strategy using approximate nearest neighbors and dynamic geometric data structures, achieving NC-level work and depth complexity. Furthermore, we establish CC-hardness for the same problem in high dimensions, thereby identifying the first sharp dimensionality threshold for parallel tractability of HAC. Our results unify and extend the theoretical foundations of parallel HAC, providing scalable, theoretically grounded algorithms for practical low-dimensional clustering tasks.

Technology Category

Application Category

📝 Abstract
Hierarchical Agglomerative Clustering (HAC) is an extensively studied and widely used method for hierarchical clustering in $mathbb{R}^k$ based on repeatedly merging the closest pair of clusters according to an input linkage function $d$. Highly parallel (i.e., NC) algorithms are known for $(1+ε)$-approximate HAC (where near-minimum rather than minimum pairs are merged) for certain linkage functions that monotonically increase as merges are performed. However, no such algorithms are known for many important but non-monotone linkage functions such as centroid and Ward's linkage. In this work, we show that a general class of non-monotone linkage functions -- which include centroid and Ward's distance -- admit efficient NC algorithms for $(1+ε)$-approximate HAC in low dimensions. Our algorithms are based on a structural result which may be of independent interest: the height of the hierarchy resulting from any constant-approximate HAC on $n$ points for this class of linkage functions is at most $operatorname{poly}(log n)$ as long as $k = O(log log n / log log log n)$. Complementing our upper bounds, we show that NC algorithms for HAC with these linkage functions in emph{arbitrary} dimensions are unlikely to exist by showing that HAC is CC-hard when $d$ is centroid distance and $k = n$.
Problem

Research questions and friction points this paper is trying to address.

Develops parallel NC algorithms for non-monotone linkage HAC
Addresses centroid and Ward's linkage in low dimensions
Proves CC-hardness for HAC in arbitrary dimensions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel NC algorithms for non-monotone linkage functions
Poly-log height hierarchy for constant-approximate HAC
CC-hardness proof for centroid distance in high dimensions
🔎 Similar Papers
No similar papers found.