Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm

πŸ“… 2025-06-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Traditional clustering-based speaker diarization methods struggle to model complex embedding distributions and assign overlapping speech segments accurately in multi-speaker overlapping scenarios. To address this, we propose an end-to-end overlapping community detection framework that jointly identifies multiple speakers within the same speech segment. Our approach is the first to integrate Graph Attention Networks (GAT) with Multi-Label Propagation Algorithm (MLPA), enabling discriminative multi-speaker assignment per segment. Additionally, we introduce a speaker embedding optimization strategy to enhance inter-speaker separability. Evaluated on the DIHARD-III benchmark, our method achieves a Diarization Error Rate (DER) of 15.94% without oracle Voice Activity Detection (VAD) and improves to 11.07% with oracle VADβ€”setting a new state-of-the-art. This work overcomes a key bottleneck in unsupervised overlapping community discovery and significantly advances speaker diarization accuracy in overlapping speech conditions.

Technology Category

Application Category

πŸ“ Abstract
In speaker diarization, traditional clustering-based methods remain widely used in real-world applications. However, these methods struggle with the complex distribution of speaker embeddings and overlapping speech segments. To address these limitations, we propose an Overlapping Community Detection method based on Graph Attention networks and the Label Propagation Algorithm (OCDGALP). The proposed framework comprises two key components: (1) a graph attention network that refines speaker embeddings and node connections by aggregating information from neighboring nodes, and (2) a label propagation algorithm that assigns multiple community labels to each node, enabling simultaneous clustering and overlapping community detection. Experimental results show that the proposed method significantly reduces the Diarization Error Rate (DER), achieving a state-of-the-art 15.94% DER on the DIHARD-III dataset without oracle Voice Activity Detection (VAD), and an impressive 11.07% with oracle VAD.
Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of clustering-based speaker diarization methods
Handles complex speaker embedding distributions and overlapping speech
Reduces Diarization Error Rate using graph attention networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Attention Networks refine speaker embeddings
Label Propagation Algorithm detects overlapping communities
Combined method reduces Diarization Error Rate significantly
πŸ”Ž Similar Papers
No similar papers found.
Zhaoyang Li
Zhaoyang Li
Ph.D student, University of Science and Technology of China
Computer Vision
J
Jie Wang
School of Electronic Science and Engineering, Xiamen University, China
X
Xiaoxiao Li
School of Electronic Information, Beijing Jiaotong University, China
W
Wangjie Li
School of Electronic Science and Engineering, Xiamen University, China
Longjie Luo
Longjie Luo
Xiamen University
speech signal processing
L
Lin Li
School of Electronic Science and Engineering, Xiamen University, China
Q
Q. Hong
School of Informatics, Xiamen University, China