🤖 AI Summary
This study addresses the limitations of binary schizophrenia diagnosis by developing an interpretable speech-based biomarker centered on articulatory coordination. We propose the Weighted Sum of Exponentially Decaying differences (WSED), a novel metric derived from spectro-temporal difference maps of laryngeal motion features, to quantify dynamic, multi-level vocal tract coordination during speech. For the first time, WSED is modeled as a continuous biomarker—significantly correlating with total Brief Psychiatric Rating Scale (BPRS) scores and the balance between positive and negative symptoms (r = −0.68, p < 0.001)—and effectively differentiating coordination patterns across symptom-dominant clinical states. Unlike conventional classification models, WSED offers transparency, sensitivity, and direct clinical interpretability. It provides an objective, non-invasive, and real-time quantitative tool for assessing symptom severity, advancing translational applications of speech biomarkers in psychiatric evaluation.
📝 Abstract
Advances in artificial intelligence (AI) and deep learning have improved diagnostic capabilities in healthcare, yet limited interpretability continues to hinder clinical adoption. Schizophrenia, a complex disorder with diverse symptoms including disorganized speech and social withdrawal, demands tools that capture symptom severity and provide clinically meaningful insights beyond binary diagnosis. Here, we present an interpretable framework that leverages articulatory speech features through eigenspectra difference plots and a weighted sum with exponential decay (WSED) to quantify vocal tract coordination. Eigenspectra plots effectively distinguished complex from simpler coordination patterns, and WSED scores reliably separated these groups, with ambiguity confined to a narrow range near zero. Importantly, WSED scores correlated not only with overall BPRS severity but also with the balance between positive and negative symptoms, reflecting more complex coordination in subjects with pronounced positive symptoms and the opposite trend for stronger negative symptoms. This approach offers a transparent, severity-sensitive biomarker for schizophrenia, advancing the potential for clinically interpretable speech-based assessment tools.