MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents

πŸ“… 2024-08-22
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Mental health research has long suffered from a scarcity of high-quality, privacy-compliant Chinese clinical dialogue data. To address this, we propose a neuro-symbolic collaborative doctor-patient dual-agent framework that leverages 1,000 de-identified real-world clinical cases, integrating dynamic diagnostic trees with large language models to generate MDD-5kβ€”the first large-scale Chinese diagnostic dialogue dataset for mental disorders (5,000 long, multi-turn dialogues). MDD-5k introduces fine-grained annotations including structured diagnostic conclusions and treatment recommendations, ensuring high fidelity, interpretability, and linguistic diversity. Human evaluation confirms its diagnostic reasoning faithfully mirrors clinical practice, while downstream experiments demonstrate significant improvements in both accuracy and generalization of AI-based psychological assessment models. MDD-5k establishes a critical foundation for privacy-preserving, controllable, and trustworthy AI-powered mental health services.

Technology Category

Application Category

πŸ“ Abstract
The clinical diagnosis of most mental disorders primarily relies on the conversations between psychiatrist and patient. The creation of such diagnostic conversation datasets is promising to boost the AI mental healthcare community. However, directly collecting the conversations in real diagnosis scenarios is near impossible due to stringent privacy and ethical considerations. To address this issue, we seek to synthesize diagnostic conversation by exploiting anonymized patient cases that are easier to access. Specifically, we design a neuro-symbolic multi-agent framework for synthesizing the diagnostic conversation of mental disorders with large language models. It takes patient case as input and is capable of generating multiple diverse conversations with one single patient case. The framework basically involves the interaction between a doctor agent and a patient agent, and generates conversations under symbolic control via a dynamic diagnosis tree. By applying the proposed framework, we develop the largest Chinese mental disorders diagnosis dataset MDD-5k. This dataset is built upon 1000 real, anonymized patient cases by cooperating with Shanghai Mental Health Center and comprises 5000 high-quality long conversations with diagnosis results and treatment opinions as labels. To the best of our knowledge, it's also the first labeled dataset for Chinese mental disorders diagnosis. Human evaluation demonstrates the proposed MDD-5k dataset successfully simulates human-like diagnostic process of mental disorders.
Problem

Research questions and friction points this paper is trying to address.

Privacy-Preserving
Mental Health
AI Diagnosis
Innovation

Methods, ideas, or system contributions that make the work stand out.

MDD-5k dataset
Chinese mental health diagnosis
large language models and rule tree
πŸ”Ž Similar Papers
No similar papers found.
C
Congchi Yin
Shanda Group, Shanghai, China; Chen Frontier Lab for AI and Mental Health, Tianqiao and Chrissy Chen Institute, Shanghai, China
F
Feng Li
Shanda Group, Shanghai, China
S
Shu Zhang
Shanda Group, Shanghai, China
Z
Zike Wang
Shanda Group, Shanghai, China
Jun Shao
Jun Shao
Professor of Statistics, University of Wisconsin Madison
Statistics
P
Piji Li
Shanda Group, Shanghai, China
J
Jianhua Chen
Shanghai Mental Health Center; Shanghai Jiao Tong University School of Medicine; Shanghai Clinical Research Center for Mental Health; Shanghai Key Laboratory of Psychotic Disorders
X
Xun Jiang
Shanda Group, Shanghai, China