Automatic classification of stop realisation with wav2vec2.0

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the fine-grained phonological variation modeling challenge of automatic detection of stop bursts—a core phonetic phenomenon in speech science. We propose an end-to-end binary classification framework built upon wav2vec 2.0. Methodologically, we are the first to successfully adapt wav2vec 2.0 to millisecond-level stop realization discrimination, eliminating the need for phoneme alignment or hand-crafted feature engineering; instead, we directly leverage pretrained frame-level representations and attach a lightweight classifier head. Evaluated on cross-lingual (English and Japanese) and cross-quality (gold-standard vs. raw) corpora, our approach achieves high accuracy, with Pearson correlation coefficients exceeding 0.95 between automatically and manually annotated variant distributions. Our key contribution lies in overcoming the limitations of conventional tools in modeling variable phonological phenomena, enabling high-fidelity, scalable automatic annotation of large-scale speech corpora—thereby establishing a robust computational infrastructure for empirical research on phonological variation.

Technology Category

Application Category

📝 Abstract
Modern phonetic research regularly makes use of automatic tools for the annotation of speech data, however few tools exist for the annotation of many variable phonetic phenomena. At the same time, pre-trained self-supervised models, such as wav2vec2.0, have been shown to perform well at speech classification tasks and latently encode fine-grained phonetic information. We demonstrate that wav2vec2.0 models can be trained to automatically classify stop burst presence with high accuracy in both English and Japanese, robust across both finely-curated and unprepared speech corpora. Patterns of variability in stop realisation are replicated with the automatic annotations, and closely follow those of manual annotations. These results demonstrate the potential of pre-trained speech models as tools for the automatic annotation and processing of speech corpus data, enabling researchers to `scale-up' the scope of phonetic research with relative ease.
Problem

Research questions and friction points this paper is trying to address.

Classifying stop burst presence using wav2vec2.0
Enabling automatic annotation of variable phonetic phenomena
Scaling phonetic research with pre-trained speech models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses wav2vec2.0 for stop classification
Trains model for English and Japanese
Replicates manual annotation accuracy
🔎 Similar Papers
No similar papers found.