🤖 AI Summary
To address the reliance of speaker diarization on manual hyperparameter tuning for affinity matrix optimization in spectral clustering, this paper proposes SC-pNA, an adaptive sparsification method. Methodologically, SC-pNA introduces (1) a novel row-wise dual-cluster discrimination–based p%-adaptive pruning mechanism that requires no external validation data or hyperparameter adjustment, and (2) an automatic cluster number estimation strategy integrating local neighborhood adaptation with the largest eigengap criterion. Evaluated on the DIHARD-III benchmark, SC-pNA achieves state-of-the-art performance—significantly outperforming existing automated tuning approaches—while also improving computational efficiency. This work represents the first fully parameter-free spectral clustering pipeline for speaker diarization, maintaining high clustering accuracy while substantially enhancing robustness and practical applicability.
📝 Abstract
Spectral clustering has proven effective in grouping speech representations for speaker diarization tasks, although post-processing the affinity matrix remains difficult due to the need for careful tuning before constructing the Laplacian. In this study, we present a novel pruning algorithm to create a sparse affinity matrix called emph{spectral clustering on p-neighborhood retained affinity matrix} (SC-pNA). Our method improves on node-specific fixed neighbor selection by allowing a variable number of neighbors, eliminating the need for external tuning data as the pruning parameters are derived directly from the affinity matrix. SC-pNA does so by identifying two clusters in every row of the initial affinity matrix, and retains only the top $p%$ similarity scores from the cluster containing larger similarities. Spectral clustering is performed subsequently, with the number of clusters determined as the maximum eigengap. Experimental results on the challenging DIHARD-III dataset highlight the superiority of SC-pNA, which is also computationally more efficient than existing auto-tuning approaches.