Optimizing Speech-Input Length for Speaker-Independent Depression Classification

📅 2019-09-15

🏛️ Interspeech

📈 Citations: 15

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This study investigates how speech duration affects cross-speaker automatic depression detection performance. Method: Leveraging over 1,400 hours of real-world clinical screening speech data, we adopt a speaker-independent setup and response-level temporal modeling. Contribution/Results: We empirically identify a dual-duration threshold phenomenon in depression classification: a minimum effective duration (~3 seconds) and a performance saturation duration—significantly longer for high-performance systems. Beyond saturation, switching to a new prompt yields greater gains than extending the current response, informing clinically deployable interaction design principles. Comparative evaluation of two NLP systems, coupled with performance attribution analysis, confirms that optimizing response length substantially improves model robustness and practical utility. These findings provide critical empirical evidence and methodological guidance for lightweight, efficient, and human–AI collaborative design of speech-based depression screening systems.

Technology Category

Application Category

📝 Abstract

Machine learning models for speech-based depression classification offer promise for health care applications. Despite growing work on depression classification, little is understood about how the length of speech-input impacts model performance. We analyze results for speaker-independent depression classification using a corpus of over 1400 hours of speech from a human-machine health screening application. We examine performance as a function of response input length for two NLP systems that differ in overall performance. Results for both systems show that performance depends on natural length, elapsed length, and ordering of the response within a session. Systems share a minimum length threshold, but differ in a response saturation threshold, with the latter higher for the better system. At saturation it is better to pose a new question to the speaker, than to continue the current response. These and additional reported results suggest how applications can be better designed to both elicit and process optimal input lengths for depression classification.

Problem

Research questions and friction points this paper is trying to address.

Depression Detection

Speech Duration

Machine Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Depression Detection

Speech Duration

Machine Learning Optimization

🔎 Similar Papers

A Frame-based Attention Interpretation Method for Relevant Acoustic Feature Extraction in Long Speech Depression Detection