WenetSpeech-Chuan: A Large-Scale Sichuanese Corpus with Rich Annotation for Dialectal Speech Processing

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the critical bottleneck of insufficient large-scale, open-source speech data for mainstream Chinese dialects—particularly Sichuanese—which severely hinders ASR and TTS development, this work introduces WenetSpeech-Chuan, the largest publicly available Sichuanese speech corpus to date (10,000 hours). We propose Chuan-Pipeline, an end-to-end dialect data processing framework integrating ASR-based pre-annotation, text-speech alignment, pronunciation variant modeling, and multi-stage human verification to enable efficient data cleaning and fine-grained annotation. Leveraging this corpus, we release a standardized ASR/TTS benchmark, substantially lowering barriers to dialect speech research. Models trained on WenetSpeech-Chuan achieve production-grade performance within open-source ecosystems and demonstrate exceptional fairness and bias mitigation across multiple dialects in rigorous cross-dialect evaluation.

Technology Category

Application Category

📝 Abstract
The scarcity of large-scale, open-source data for dialects severely hinders progress in speech technology, a challenge particularly acute for the widely spoken Sichuanese dialects of Chinese. To address this critical gap, we introduce WenetSpeech-Chuan, a 10,000-hour, richly annotated corpus constructed using our novel Chuan-Pipeline, a complete data processing framework for dialectal speech. To facilitate rigorous evaluation and demonstrate the corpus's effectiveness, we also release high-quality ASR and TTS benchmarks, WenetSpeech-Chuan-Eval, with manually verified transcriptions. Experiments show that models trained on WenetSpeech-Chuan achieve state-of-the-art performance among open-source systems and demonstrate results comparable to commercial services. As the largest open-source corpus for Sichuanese dialects, WenetSpeech-Chuan not only lowers the barrier to research in dialectal speech processing but also plays a crucial role in promoting AI equity and mitigating bias in speech technologies. The corpus, benchmarks, models, and receipts are publicly available on our project page.
Problem

Research questions and friction points this paper is trying to address.

Addressing the scarcity of large-scale open-source data for Sichuanese dialects
Providing rich annotations and benchmarks for dialectal speech processing research
Lowering barriers and mitigating bias in speech technology for underrepresented dialects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale Sichuanese corpus with rich annotation
Novel Chuan-Pipeline for dialectal speech processing
High-quality ASR and TTS benchmarks released
🔎 Similar Papers
No similar papers found.
Y
Yuhang Dai
Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University
Z
Ziyu Zhang
Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University
S
Shuai Wang
School of Intelligence Science and Technology, Nanjing University
L
Longhao Li
Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University
Z
Zhao Guo
Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University
T
Tianlun Zuo
Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University
S
Shuiyuan Wang
Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University
H
Hongfei Xue
Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University
C
Chengyou Wang
Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University
Q
Qing Wang
Institute of Artificial Intelligence (TeleAI), China Telecom
X
Xin Xu
Beijing AISHELL Technology Co., Ltd.
Hui Bu
Hui Bu
aishell
Speech Recognition、Speech databases and text corpora、Special topics on speech databases and
J
Jie Li
Institute of Artificial Intelligence (TeleAI), China Telecom
J
Jian Kang
Institute of Artificial Intelligence (TeleAI), China Telecom
B
Binbin Zhang
School of Intelligence Science and Technology, Nanjing University
L
Lei Xie
Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University