Self-Tuning Spectral Clustering for Speaker Diarization

📅 2024-09-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the reliance of speaker diarization on manual hyperparameter tuning for affinity matrix optimization in spectral clustering, this paper proposes SC-pNA, an adaptive sparsification method. Methodologically, SC-pNA introduces (1) a novel row-wise dual-cluster discrimination–based p%-adaptive pruning mechanism that requires no external validation data or hyperparameter adjustment, and (2) an automatic cluster number estimation strategy integrating local neighborhood adaptation with the largest eigengap criterion. Evaluated on the DIHARD-III benchmark, SC-pNA achieves state-of-the-art performance—significantly outperforming existing automated tuning approaches—while also improving computational efficiency. This work represents the first fully parameter-free spectral clustering pipeline for speaker diarization, maintaining high clustering accuracy while substantially enhancing robustness and practical applicability.

Technology Category

Application Category

📝 Abstract
Spectral clustering has proven effective in grouping speech representations for speaker diarization tasks, although post-processing the affinity matrix remains difficult due to the need for careful tuning before constructing the Laplacian. In this study, we present a novel pruning algorithm to create a sparse affinity matrix called emph{spectral clustering on p-neighborhood retained affinity matrix} (SC-pNA). Our method improves on node-specific fixed neighbor selection by allowing a variable number of neighbors, eliminating the need for external tuning data as the pruning parameters are derived directly from the affinity matrix. SC-pNA does so by identifying two clusters in every row of the initial affinity matrix, and retains only the top $p%$ similarity scores from the cluster containing larger similarities. Spectral clustering is performed subsequently, with the number of clusters determined as the maximum eigengap. Experimental results on the challenging DIHARD-III dataset highlight the superiority of SC-pNA, which is also computationally more efficient than existing auto-tuning approaches.
Problem

Research questions and friction points this paper is trying to address.

Improves spectral clustering for speaker diarization by pruning affinity matrix
Eliminates need for external tuning data via adaptive neighbor selection
Enhances computational efficiency and accuracy in DIHARD-III dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

Variable neighbor selection for sparse affinity matrix
Cluster-based pruning retains top similarity scores
Auto-tuning via maximum eigengap for cluster count
🔎 Similar Papers
No similar papers found.