Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT

📅 2019-09-15
🏛️ Interspeech
📈 Citations: 24
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the polyphonic character disambiguation challenge in Chinese end-to-end speech synthesis. We propose the first end-to-end grapheme-to-phoneme (G2P) framework for Chinese leveraging pretrained BERT: it takes raw character sequences as input—requiring neither manual word segmentation nor phoneme pre-annotation—and employs BERT to encode contextual semantics, jointly with FC, LSTM, and Transformer classifiers to predict polyphonic pronunciations. Our key contribution is the first application of BERT to Chinese polyphonic disambiguation, enabling semantic-driven, end-to-end modeling. Experiments demonstrate that BERT substantially improves disambiguation accuracy, outperforming an LSTM baseline on a standard test set. Furthermore, we empirically reveal that context window length critically influences disambiguation performance.

Technology Category

Application Category

📝 Abstract
Grapheme-to-phoneme (G2P) conversion serves as an essential component in Chinese Mandarin text-to-speech (TTS) system, where polyphone disambiguation is the core issue. In this paper, we propose an end-to-end framework to predict the pronunciation of a polyphonic character, which accepts sentence containing polyphonic character as input in the form of Chinese character sequence without the necessity of any preprocessing. The proposed method consists of a pre-trained bidirectional encoder representations from Transformers (BERT) model and a neural network (NN) based classifier. The pre-trained BERT model extracts semantic features from a raw Chinese character sequence and the NN based classifier predicts the polyphonic character's pronunciation according to BERT output. In out experiments, we implemented three classifiers, a fully-connected network based classifier, a long short-term memory (LSTM) network based classifier and a Transformer block based classifier. The experimental results compared with the baseline approach based on LSTM demonstrate that, the pre-trained model extracts effective semantic features, which greatly enhances the performance of polyphone disambiguation. In addition, we also explored the impact of contextual information on polyphone disambiguation.
Problem

Research questions and friction points this paper is trying to address.

Chinese Text-to-Speech
Homophone Disambiguation
End-to-End System
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained BERT
Neural Network Classifier
G2P in TTS
🔎 Similar Papers
No similar papers found.
Dongyang Dai
Dongyang Dai
Unknown affiliation
Speech SynthesisComputational AdvertisingMachine Learning
Z
Zhiyong Wu
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China; Beijing National Research Centre for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University, Beijing, China; Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China
Shiyin Kang
Shiyin Kang
Tencent AI Lab, Tencent, Shenzhen, China
Xixin Wu
Xixin Wu
The Chinese University of Hong Kong
J
Jia Jia
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China; Beijing National Research Centre for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University, Beijing, China
Dan Su
Dan Su
Tencent AI Lab
speech recognitionspeech synthesisspeaker recognition
D
Dong Yu
Tencent AI Lab, Tencent, Shenzhen, China
H
H. Meng
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China; Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China