Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the joint modeling challenge of speaker diarization under overlapping speech and low-SNR ASR in the MISP 2025 Challenge, this work proposes an adaptive hybrid diarization architecture and ASR-aware observation enhancement. First, we introduce a novel overlap-adaptive hybrid diarization framework integrating end-to-end segmentation (WavLM), traditional clustering (AHC/i-vector), and guided source separation (GSS). Second, we design an ASR-aware feature compensation mechanism to overcome GSS performance degradation in noisy conditions. Third, we construct an end-to-end and modularly coordinated SD-ASR cascaded system. Our approach achieves first place in both Track 2 (character error rate: 9.48%) and Track 3 (cpCER: 11.56%), demonstrating state-of-the-art robustness and effectiveness in realistic meeting scenarios with overlapping speech and low signal-to-noise ratios.

Technology Category

Application Category

📝 Abstract
This paper presents the system developed to address the MISP 2025 Challenge. For the diarization system, we proposed a hybrid approach combining a WavLM end-to-end segmentation method with a traditional multi-module clustering technique to adaptively select the appropriate model for handling varying degrees of overlapping speech. For the automatic speech recognition (ASR) system, we proposed an ASR-aware observation addition method that compensates for the performance limitations of Guided Source Separation (GSS) under low signal-to-noise ratio conditions. Finally, we integrated the speaker diarization and ASR systems in a cascaded architecture to address Track 3. Our system achieved character error rates (CER) of 9.48% on Track 2 and concatenated minimum permutation character error rate (cpCER) of 11.56% on Track 3, ultimately securing first place in both tracks and thereby demonstrating the effectiveness of the proposed methods in real-world meeting scenarios.
Problem

Research questions and friction points this paper is trying to address.

Hybrid diarization for overlapping speech adaptation
ASR-aware method to improve low SNR performance
Integrated system for real-world meeting scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid WavLM and clustering for overlap diarization
ASR-aware addition to enhance GSS performance
Cascaded diarization-ASR integration for robust results
🔎 Similar Papers
No similar papers found.
S
Shangkun Huang
Beijing Fosafer Information Technology Co., Ltd., China
Yuxuan Du
Yuxuan Du
Nanyang Technological University
Quantum machine learningQuantum computingAI for Quantum Science
J
Jingwen Yang
Beijing Fosafer Information Technology Co., Ltd., China
D
Dejun Zhang
Beijing Fosafer Information Technology Co., Ltd., China
X
Xupeng Jia
Beijing Fosafer Information Technology Co., Ltd., China
J
Jing Deng
Beijing Fosafer Information Technology Co., Ltd., China
J
Jintao Kang
Institute of Forensic Science, Ministry of Public Security, China; The Institute of Linguistics, Chinese Academy of Social Sciences, China
R
Rong Zheng
Beijing Fosafer Information Technology Co., Ltd., China