Prominence-aware automatic speech recognition for conversational speech

📅 2025-09-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of prosodic prominence detection in Austrian German conversational speech by proposing the first end-to-end salience-aware automatic speech recognition (ASR) framework, jointly performing word-level transcription and prosodic prominence classification. Methodologically, we fine-tune wav2vec 2.0 to construct a large-scale prosodically annotated corpus and design a multi-task Transformer model that jointly models ASR and prominence detection with shared parameters and co-optimization. Our key contribution is the first unified end-to-end training of ASR and prominence prediction within a Transformer architecture—eliminating cascaded pipelines or post-hoc processing. Experiments demonstrate that the model achieves 85.53% accuracy in prominence detection while maintaining baseline ASR performance, confirming its capacity to effectively encode prosodic salience information from conversational speech. This work establishes a novel paradigm for prosody-aware speech understanding in low-resource dialects.

Technology Category

Application Category

📝 Abstract
This paper investigates prominence-aware automatic speech recognition (ASR) by combining prominence detection and speech recognition for conversational Austrian German. First, prominence detectors were developed by fine-tuning wav2vec2 models to classify word-level prominence. The detector was then used to automatically annotate prosodic prominence in a large corpus. Based on those annotations, we trained novel prominence-aware ASR systems that simultaneously transcribe words and their prominence levels. The integration of prominence information did not change performance compared to our baseline ASR system, while reaching a prominence detection accuracy of 85.53% for utterances where the recognized word sequence was correct. This paper shows that transformer-based models can effectively encode prosodic information and represents a novel contribution to prosody-enhanced ASR, with potential applications for linguistic research and prosody-informed dialogue systems.
Problem

Research questions and friction points this paper is trying to address.

Develops prominence-aware ASR for conversational Austrian German
Integrates word transcription with prosodic prominence detection
Evaluates transformer models for prosody-enhanced speech recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned wav2vec2 models for prominence detection
Automatically annotated prosodic prominence in corpus
Trained ASR systems transcribing words and prominence simultaneously
🔎 Similar Papers
No similar papers found.
J
Julian Linke
Signal Processing and Speech Communication Laboratory, Graz University of Technology, Austria
Barbara Schuppler
Barbara Schuppler
SPSC Laboratory, Graz University of Technology
PhoneticsSpeech TechnologyLinguisticsInclusive TechnologiesSignal Processing