Explaining GPT-4's Schema of Depression Using Machine Behavior Analysis

๐Ÿ“… 2024-11-21
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the opaque symptom representation mechanisms of large language models (LLMs)โ€”particularly GPT-4โ€”in depression assessment. We present the first systematic decoding of LLM-derived depressive symptom structure using a machine behavioral analysis framework integrating item response theory, symptom correlation modeling, expert consensus evaluation, and validation against large-scale self-report datasets. Our analysis uncovers GPT-4โ€™s clinical reasoning patterns and cognitive biases: it significantly underestimates suicidal ideation while overestimating psychomotor symptoms, prompting a novel hypothesis on symptom causality direction. The model demonstrates high convergent validity (self-report *r* = 0.71; expert-rated *r* = 0.81) and strong internal consistency. This work establishes a methodological foundation and empirical evidence for interpretable, clinically aligned LLM evaluation in mental health applications.

Technology Category

Application Category

๐Ÿ“ Abstract
Use of large language models such as ChatGPT (GPT-4) for mental health support has grown rapidly, emerging as a promising route to assess and help people with mood disorders, like depression. However, we have a limited understanding of GPT-4's schema of mental disorders, that is, how it internally associates and interprets symptoms. In this work, we leveraged contemporary measurement theory to decode how GPT-4 interrelates depressive symptoms to inform both clinical utility and theoretical understanding. We found GPT-4's assessment of depression: (a) had high overall convergent validity (r = .71 with self-report on 955 samples, and r = .81 with experts judgments on 209 samples); (b) had moderately high internal consistency (symptom inter-correlates r = .23 to .78 ) that largely aligned with literature and self-report; except that GPT-4 (c) underemphasized suicidality's -- and overemphasized psychomotor's -- relationship with other symptoms, and (d) had symptom inference patterns that suggest nuanced hypotheses (e.g. sleep and fatigue are influenced by most other symptoms while feelings of worthlessness/guilt is mostly influenced by depressed mood).
Problem

Research questions and friction points this paper is trying to address.

Analyzing how GPT models internally associate depression symptoms
Evaluating GPT-4's symptom correlations against clinical standards
Identifying biases in GPT models' interpretation of depression symptoms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used measurement theory to decode GPT symptom interrelations
Analyzed machine behavior to explain LLM symptom associations
Provided empirical foundation for mental health model explainability
๐Ÿ”Ž Similar Papers
No similar papers found.
Adithya V Ganesan
Adithya V Ganesan
Stony Brook University
Natural Language ProcessingComputational Social Science
Vasudha Varadarajan
Vasudha Varadarajan
Carnegie Mellon University
natural language processingcomputational social science
Y
Yash Kumar Lal
Department of Computer Science, Stony Brook University, USA.
V
Veerle C. Eijsbroek
Department of Psychology, Lund University, Sweden.
K
Katarina Kjell
Department of Psychology, Lund University, Sweden.
O
O. Kjell
Department of Computer Science, Stony Brook University, USA.;Department of Psychology, Lund University, Sweden.
T
Tanuja Dhanasekaran
Independent Researcher, USA.
E
Elizabeth C. Stade
Department of Psychology, Stanford University, USA.
J
J. Eichstaedt
Department of Psychology, Stanford University, USA.;Institute for Human-Centered AI, Stanford University, USA.
Ryan L. Boyd
Ryan L. Boyd
Department of Psychology, University of Texas at Dallas
computational social sciencetext analysissocial/personality psychologybehavioremotion
H
H. A. Schwartz
Department of Computer Science, Stony Brook University, USA.
Lucie Flek
Lucie Flek
University of Bonn, Lamarr Institute of Machine Learning and Artificial Intelligence
Natural Language ProcessingMachine LearningPhysicsComputational Social Sciences