Detecting COPD Through Speech Analysis: A Dataset of Danish Speech and Machine Learning Approach

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Language bias limits the cross-linguistic applicability of voice-based screening for chronic obstructive pulmonary disease (COPD), particularly in non-English-speaking populations. Method: To address this, we constructed the first Danish-language COPD voice dataset, comprising recordings of reading, coughing, and sustained vowel phonation. Acoustic features were extracted using openSMILE, and x-vector embeddings were integrated with machine learning models—including logistic regression—for binary COPD classification. Contribution/Results: A logistic regression model trained on handcrafted acoustic features achieved 67% accuracy, demonstrating that COPD-related vocal biomarkers are linguistically transferable. This work fills a critical gap in Nordic-language COPD voice data and provides methodological foundations and empirical evidence for low-cost, large-scale, multilingual remote respiratory screening.

Technology Category

Application Category

📝 Abstract

Chronic Obstructive Pulmonary Disease (COPD) is a serious and debilitating disease affecting millions around the world. Its early detection using non-invasive means could enable preventive interventions that improve quality of life and patient outcomes, with speech recently shown to be a valuable biomarker. Yet, its validity across different linguistic groups remains to be seen. To that end, audio data were collected from 96 Danish participants conducting three speech tasks (reading, coughing, sustained vowels). Half of the participants were diagnosed with different levels of COPD and the other half formed a healthy control group. Subsequently, we investigated different baseline models using openSMILE features and learnt x-vector embeddings. We obtained a best accuracy of 67% using openSMILE features and logistic regression. Our findings support the potential of speech-based analysis as a non-invasive, remote, and scalable screening tool as part of future COPD healthcare solutions.

Problem

Research questions and friction points this paper is trying to address.

Detecting COPD using non-invasive speech analysis

Validating speech biomarkers across different linguistic groups

Developing machine learning models for COPD screening

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Danish speech dataset for COPD detection

Employing openSMILE features and x-vector embeddings

Achieving 67% accuracy with logistic regression

🔎 Similar Papers

No similar papers found.