A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the clinical need for integrating nonverbal information in psychiatric assessment by proposing a symptom-level Bayesian modeling framework that fuses acoustic, prosodic, and linguistic features from speech to quantify depression and anxiety symptom severity—not binary diagnostic classification. The method introduces an interpretable, clinically grounded probabilistic model designed for cross-population fairness and calibration reliability. Evaluated on a large-scale dataset comprising over 30,000 speakers, the model achieves AUCs of 0.842 and 0.831 for predicting total depression and anxiety scores, respectively, with all core symptom dimensions exceeding AUC = 0.74. Comprehensive validation using discriminative (ROC-AUC) and calibration (Expected Calibration Error, ECE) metrics confirms robust clinical utility and reliability.

Technology Category

Application Category

📝 Abstract
During psychiatric assessment, clinicians observe not only what patients report, but important nonverbal signs such as tone, speech rate, fluency, responsiveness, and body language. Weighing and integrating these different information sources is a challenging task and a good candidate for support by intelligence-driven tools - however this is yet to be realized in the clinic. Here, we argue that several important barriers to adoption can be addressed using Bayesian network modelling. To demonstrate this, we evaluate a model for depression and anxiety symptom prediction from voice and speech features in large-scale datasets (30,135 unique speakers). Alongside performance for conditions and symptoms (for depression, anxiety ROC-AUC=0.842,0.831 ECE=0.018,0.015; core individual symptom ROC-AUC>0.74), we assess demographic fairness and investigate integration across and redundancy between different input modality types. Clinical usefulness metrics and acceptability to mental health service users are explored. When provided with sufficiently rich and large-scale multimodal data streams and specified to represent common mental conditions at the symptom rather than disorder level, such models are a principled approach for building robust assessment support tools: providing clinically-relevant outputs in a transparent and explainable format that is directly amenable to expert clinical supervision.
Problem

Research questions and friction points this paper is trying to address.

Predicting depression and anxiety symptoms from voice and speech data
Integrating multimodal information for clinical assessment support
Ensuring demographic fairness and clinical usefulness in prediction models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian network modeling for multimodal data integration
Symptom-level prediction from voice and speech features
Transparent and explainable clinical assessment support tools
🔎 Similar Papers
No similar papers found.
Agnes Norbury
Agnes Norbury
Thymia Limited
cognitive neurosciencecomputational psychiatrydigital mental health
G
George Fairs
thymia Limited, London, UK
A
Alexandra L. Georgescu
thymia Limited, London, UK; Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, UK
M
Matthew M. Nour
Department of Psychiatry, University of Oxford, Oxford, UK; Max Planck UCL Centre for Computational Psychiatry and Ageing, University College London, London, UK
E
Emilia Molimpakis
thymia Limited, London, UK
S
Stefano Goria
thymia Limited, London, UK