Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the fine-grained detection of semantic, respiratory, and hybrid (respiratory-semantic) pauses in post-exercise speech, along with exertion-level classification. We introduce the first multi-type pause annotation dataset for post-exercise speech and a hierarchical cascaded modeling framework. Methodologically, we integrate Wav2Vec2’s hierarchical representations with conventional acoustic features (MFCCs/MFBs), and design both single-model and two-stage cascaded architectures compatible with GRU, CNN-LSTM, AlexNet, and VGG16 for joint multi-task learning. Our key contribution lies in the first systematic annotation and joint recognition of all three pause types in post-exercise speech, enhanced by feature–task co-design to improve generalization. Experiments show pause detection accuracies of 89% (semantic), 55% (respiratory), 86% (hybrid), and 73% overall; exertion-level classification achieves 90.5% accuracy—substantially outperforming prior approaches.

Technology Category

Application Category

📝 Abstract
Post-exercise speech contains rich physiological and linguistic cues, often marked by semantic pauses, breathing pauses, and combined breathing-semantic pauses. Detecting these events enables assessment of recovery rate, lung function, and exertion-related abnormalities. However, existing works on identifying and distinguishing different types of pauses in this context are limited. In this work, building on a recently released dataset with synchronized audio and respiration signals, we provide systematic annotations of pause types. Using these annotations, we systematically conduct exploratory breathing and semantic pause detection and exertion-level classification across deep learning models (GRU, 1D CNN-LSTM, AlexNet, VGG16), acoustic features (MFCC, MFB), and layer-stratified Wav2Vec2 representations. We evaluate three setups-single feature, feature fusion, and a two-stage detection-classification cascade-under both classification and regression formulations. Results show per-type detection accuracy up to 89$%$ for semantic, 55$%$ for breathing, 86$%$ for combined pauses, and 73$%$overall, while exertion-level classification achieves 90.5$%$ accuracy, outperformin prior work.
Problem

Research questions and friction points this paper is trying to address.

Detect breathing and semantic pauses in post-exercise speech
Classify exertion levels using acoustic and deep learning models
Improve assessment of physiological recovery and lung function
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning models for pause detection
Feature fusion and cascade classification
Wav2Vec2 representations for exertion analysis
🔎 Similar Papers
No similar papers found.