Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming

📅 2024-09-22

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses unsupervised word segmentation and vocabulary induction from untranscribed speech. We propose a lightweight two-stage approach: first, leveraging self-supervised speech representations (wav2vec 2.0), we compute frame-wise adjacency differences and estimate word boundary confidence via sliding-window aggregation; second, we apply K-means clustering to the segmented units to induce a vocabulary. Departing from conventional dynamic programming–based optimization frameworks, our method adopts an interpretable, low-overhead paradigm of “boundary detection + clustering.” Evaluated on the ZeroSpeech 2019 zero-resource benchmark across five languages, our approach achieves ABX error rates competitive with the state-of-the-art ES-KMeans+ method, while accelerating inference by 4.8× and substantially reducing computational cost—demonstrating both high accuracy and high efficiency.

Technology Category

Application Category

📝 Abstract

We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon. Several previous methods use a scoring model coupled with dynamic programming to find an optimal segmentation. Here we propose a much simpler strategy: we predict word boundaries using the dissimilarity between adjacent self-supervised features, then we cluster the predicted segments to construct a lexicon. For a fair comparison, we update the older ES-KMeans dynamic programming method with better features and boundary constraints. On the five-language ZeroSpeech benchmarks, our simple approach gives similar state-of-the-art results compared to the new ES-KMeans+ method, while being almost five times faster. Project webpage: https://s-malan.github.io/prom-seg-clus.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised Speech Segmentation

Vocabulary Construction

Efficiency Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated Vocabulary Boundary Recognition

Efficient Dictionary Building

Cross-linguistic Compatibility

🔎 Similar Papers

No similar papers found.