Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
For low-resource languages like Kurdish, speaker diarization suffers from severe performance degradation due to scarce labeled data, dialectal variation, and frequent code-switching. To address these challenges, this paper proposes the first Wav2Vec 2.0-based end-to-end fine-tuning framework tailored for Kurdish. Our method involves domain-adaptive fine-tuning on a newly constructed Kurdish speech corpus, augmented with cross-lingual transfer learning to capture language-specific phonetic and acoustic characteristics, followed by speaker embedding clustering for post-processing. Experimental results demonstrate that our approach significantly reduces the diarization error rate by 7.2% and improves clustering purity by 13% over strong baselines. This work establishes a robust foundation for speaker diarization in under-resourced multilingual settings, with direct applicability to media transcription and call-center analytics.

Technology Category

Application Category

📝 Abstract
Speaker diarization is a fundamental task in speech processing that involves dividing an audio stream by speaker. Although state-of-the-art models have advanced performance in high-resource languages, low-resource languages such as Kurdish pose unique challenges due to limited annotated data, multiple dialects and frequent code-switching. In this study, we address these issues by training the Wav2Vec 2.0 self-supervised learning model on a dedicated Kurdish corpus. By leveraging transfer learning, we adapted multilingual representations learned from other languages to capture the phonetic and acoustic characteristics of Kurdish speech. Relative to a baseline method, our approach reduced the diarization error rate by seven point two percent and improved cluster purity by thirteen percent. These findings demonstrate that enhancements to existing models can significantly improve diarization performance for under-resourced languages. Our work has practical implications for developing transcription services for Kurdish-language media and for speaker segmentation in multilingual call centers, teleconferencing and video-conferencing systems. The results establish a foundation for building effective diarization systems in other understudied languages, contributing to greater equity in speech technology.
Problem

Research questions and friction points this paper is trying to address.

Improving speaker diarization for low-resource Kurdish language
Addressing limited data and code-switching in Kurdish dialects
Enhancing diarization accuracy using Wav2Vec 2.0 fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned Wav2Vec 2.0 for Kurdish speech
Used transfer learning for multilingual adaptation
Reduced diarization error rate significantly
🔎 Similar Papers
Abdulhady Abas Abdullah
Abdulhady Abas Abdullah
Researcher in Artificial Intelligence UKH Centre
LLMPrompt EngineeringNLPLow Resource Languages
S
Sarkhel H. Taher Karim
Computer Science Department, College of Science, University of Halabja, Kurdistan Region, Iraq.
S
Sara Azad Ahmed
Computer Engineering Dep. Komar University of Science and Technology
K
Kanar R. Tariq
Information Technology Department, Technical College of Informatics, Sulaimani Polytechnic University, Sulaymaniyah, Iraq
T
Tarik A. Rashid
Artificial Intelligence and Innovation Centre, University of Kurdistan Hewler, Erbil, Iraq.