🤖 AI Summary
This work addresses the high computational cost of computing multiple nonlinear eigenvectors in nonlinear spectral clustering by proposing a direct multiway spectral clustering algorithm based on the p-norm (with p ∈ (1,2]). The algorithm is implemented for the first time within the C++ GraphBLAS framework, unifying its core operations into sparse linear algebraic expressions. By integrating shared-memory parallelism and p-norm-specific optimizations, the approach achieves substantial gains in computational efficiency. Experimental results on a large-scale graph with 8 million nodes and 48 million edges demonstrate excellent strong scaling performance, while the clustering quality surpasses that of existing methods in terms of balanced graph cut metrics.
📝 Abstract
Nonlinear reformulations of the spectral clustering method have gained a lot of recent attention due to their increased numerical benefits and their solid mathematical background. However, the estimation of the multiple nonlinear eigenvectors is associated with an increased computational cost. We present an implementation of a direct multiway spectral clustering algorithm in the $p$-norm, for $p\in(1,2]$, using a novel C++ GraphBLAS API. The key operations are expressed in linear algebraic terms and are executed over the resulting sparse matrices and dense vectors, parameterized in the algebra pertinent to the computation. We demonstrate the effectiveness and accuracy of our shared-memory algorithm on several artificial test cases. Our numerical examples and comparative results against competitive methods indicate that the proposed implementation attains high quality clusters in terms of the balanced graph cut metric. The strong scaling capabilities of our algorithm are showcased on a range of datasets with up to $8$ million nodes and $48$ million edges.