Evaluating Spoken Language as a Biomarker for Automated Screening of Cognitive Impairment

📅 2025-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Early noninvasive screening for Alzheimer’s disease and related dementias (ADRD) remains hampered by low sensitivity, poor model interpretability, and limited suitability for home-based deployment. To address these challenges, we propose an interpretable machine learning framework integrating risk stratification with linguistic feature importance analysis. Leveraging natural speech, the framework extracts linguistically grounded features—including part-of-speech distributions, fluency metrics, and semantic diversity—and employs random forest classifiers and regressors. Cross-domain validation on DementiaBank and prospectively collected clinical data yields 69.4% sensitivity and 83.3% specificity for ADRD binary classification in the primary cohort; in real-world home settings, sensitivity reaches 70.0% and specificity 52.5%. For MMSE score prediction, mean absolute error ranges from 3.3 to 3.7 points. To our knowledge, this is the first framework to translate speech-derived biomarkers into a clinically actionable and home-deployable tool, substantially enhancing both model transparency and practical utility.

Technology Category

Application Category

📝 Abstract
Timely and accurate assessment of cognitive impairment is a major unmet need in populations at risk. Alterations in speech and language can be early predictors of Alzheimer's disease and related dementias (ADRD) before clinical signs of neurodegeneration. Voice biomarkers offer a scalable and non-invasive solution for automated screening. However, the clinical applicability of machine learning (ML) remains limited by challenges in generalisability, interpretability, and access to patient data to train clinically applicable predictive models. Using DementiaBank recordings (N=291, 64% female), we evaluated ML techniques for ADRD screening and severity prediction from spoken language. We validated model generalisability with pilot data collected in-residence from older adults (N=22, 59% female). Risk stratification and linguistic feature importance analysis enhanced the interpretability and clinical utility of predictions. For ADRD classification, a Random Forest applied to lexical features achieved a mean sensitivity of 69.4% (95% confidence interval (CI) = 66.4-72.5) and specificity of 83.3% (78.0-88.7). On real-world pilot data, this model achieved a mean sensitivity of 70.0% (58.0-82.0) and specificity of 52.5% (39.3-65.7). For severity prediction using Mini-Mental State Examination (MMSE) scores, a Random Forest Regressor achieved a mean absolute MMSE error of 3.7 (3.7-3.8), with comparable performance of 3.3 (3.1-3.5) on pilot data. Linguistic features associated with higher ADRD risk included increased use of pronouns and adverbs, greater disfluency, reduced analytical thinking, lower lexical diversity and fewer words reflecting a psychological state of completion. Our interpretable predictive modelling offers a novel approach for in-home integration with conversational AI to monitor cognitive health and triage higher-risk individuals, enabling earlier detection and intervention.
Problem

Research questions and friction points this paper is trying to address.

Cognitive Impairment Detection
Alzheimer's Disease Diagnosis
Early Stage Dementia
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine Learning
Early Detection of Dementia
Random Forest Algorithm
🔎 Similar Papers
M
Maria R. Lima
Imperial College London, UK Dementia Research Institute, Care Research and Technology Centre
A
Alexander Capstick
Imperial College London, UK Dementia Research Institute, Care Research and Technology Centre
F
Fatemeh Geranmayeh
Imperial College London, Imperial College Healthcare NHS Trust
R
R. Nilforooshan
Imperial College London, UK Dementia Research Institute, Care Research and Technology Centre, Great Ormond Street Hospital NHS Foundation Trust, Surrey and Borders Partnership NHS Foundation Trust
M
Maja Matari'c
University of Southern California
Ravi Vaidyanathan
Ravi Vaidyanathan
Professor, Imperial College London
Biomechatronics
P
P. Barnaghi
Imperial College London, UK Dementia Research Institute, Care Research and Technology Centre