Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

📅 2025-01-29

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address weak decoding generalization in low-resource multilingual automatic speech recognition (ASR), this paper proposes a hierarchical Softmax (H-Softmax) decoder optimization method based on cross-lingual semantic embedding clustering. Unlike conventional H-Softmax approaches that construct Huffman trees using shallow statistical features, our method innovatively incorporates cross-lingual word embedding similarity into the tree topology design: K-means clustering groups semantically similar tokens—including those across languages—into shared internal nodes, thereby enabling low-resource languages to leverage semantic representations from high-resource languages. Evaluated on a downsampled multilingual ASR dataset spanning 15 languages, the proposed method yields significant improvements in word error rate for low-resource languages. Results demonstrate that semantic-driven, cross-lingual decoder representation sharing simultaneously enhances both decoding efficiency and acoustic modeling performance.

Technology Category

Application Category

📝 Abstract

We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.

Problem

Research questions and friction points this paper is trying to address.

Multilingual Speech Recognition

Cross-lingual Feature Grouping

Low-resource Languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual Feature Grouping

Resource-poor Language Recognition

Improved Accuracy

🔎 Similar Papers

LUPET: Incorporating Hierarchical Information Path into Multilingual ASR