🤖 AI Summary
Speech disruptions in psychosis exhibit high phenotypic variability and poor predictive robustness. Method: We propose an uncertainty-aware multimodal speech analysis model—introducing modality-level Bayesian uncertainty quantification to psychosis speech modeling for the first time. It fuses acoustic and linguistic features and employs a task-adaptive gating mechanism to dynamically weight modality contributions, enabling calibrated predictions across both structured and unstructured spoken tasks. Contributions/Results: (1) A novel, interpretable mechanism for dynamic feature weighting; (2) significantly improved cross-task generalization; (3) on 114 participants, it achieves reduced RMSE, F1-score of 83%, and expected calibration error (ECE) of 0.045; (4) reliably identifies validated biomarkers—including pitch variability and fluency disruptions—supporting early detection and personalized clinical assessment.
📝 Abstract
Capturing subtle speech disruptions across the psychosis spectrum is challenging because of the inherent variability in speech patterns. This variability reflects individual differences and the fluctuating nature of symptoms in both clinical and non-clinical populations. Accounting for uncertainty in speech data is essential for predicting symptom severity and improving diagnostic precision. Speech disruptions characteristic of psychosis appear across the spectrum, including in non-clinical individuals. We develop an uncertainty-aware model integrating acoustic and linguistic features to predict symptom severity and psychosis-related traits. Quantifying uncertainty in specific modalities allows the model to address speech variability, improving prediction accuracy. We analyzed speech data from 114 participants, including 32 individuals with early psychosis and 82 with low or high schizotypy, collected through structured interviews, semi-structured autobiographical tasks, and narrative-driven interactions in German. The model improved prediction accuracy, reducing RMSE and achieving an F1-score of 83% with ECE = 4.5e-2, showing robust performance across different interaction contexts. Uncertainty estimation improved model interpretability by identifying reliability differences in speech markers such as pitch variability, fluency disruptions, and spectral instability. The model dynamically adjusted to task structures, weighting acoustic features more in structured settings and linguistic features in unstructured contexts. This approach strengthens early detection, personalized assessment, and clinical decision-making in psychosis-spectrum research.