From Medical Records to Diagnostic Dialogues: A Clinical-Grounded Approach and Dataset for Psychiatric Comorbidity

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Psychiatric comorbidity diagnosis is inherently complex due to interdependent disorder interactions, resulting in suboptimal clinical accuracy and efficiency. To address this, we propose PsyCoTalk—the first large-scale, multi-turn dialogue dataset specifically designed for psychiatric comorbidity assessment, comprising 3,000 dialogues, 502 synthetic electronic health records (EHRs), and over 130 diagnostic states. Methodologically, we introduce a novel synthesis pipeline for clinically plausible EHR generation, integrate a multi-agent dialogue framework guided by evidence-based clinical protocols, and design a hierarchical state machine coupled with a context tree to faithfully model real-world diagnostic reasoning across concurrent disorders. All components were validated by board-certified psychiatrists. PsyCoTalk exhibits high fidelity in dialogue structure, clinical language usage, and diagnostic reasoning strategies, significantly enhancing model training and clinical research capabilities for comorbid psychiatric conditions.

Technology Category

Application Category

📝 Abstract

Psychiatric comorbidity is clinically significant yet challenging due to the complexity of multiple co-occurring disorders. To address this, we develop a novel approach integrating synthetic patient electronic medical record (EMR) construction and multi-agent diagnostic dialogue generation. We create 502 synthetic EMRs for common comorbid conditions using a pipeline that ensures clinical relevance and diversity. Our multi-agent framework transfers the clinical interview protocol into a hierarchical state machine and context tree, supporting over 130 diagnostic states while maintaining clinical standards. Through this rigorous process, we construct PsyCoTalk, the first large-scale dialogue dataset supporting comorbidity, containing 3,000 multi-turn diagnostic dialogues validated by psychiatrists. This dataset enhances diagnostic accuracy and treatment planning, offering a valuable resource for psychiatric comorbidity research. Compared to real-world clinical transcripts, PsyCoTalk exhibits high structural and linguistic fidelity in terms of dialogue length, token distribution, and diagnostic reasoning strategies. Licensed psychiatrists confirm the realism and diagnostic validity of the dialogues. This dataset enables the development and evaluation of models capable of multi-disorder psychiatric screening in a single conversational pass.

Problem

Research questions and friction points this paper is trying to address.

Developing synthetic EMRs and multi-agent dialogues for psychiatric comorbidity

Creating PsyCoTalk dataset with 3,000 clinically validated diagnostic dialogues

Enhancing multi-disorder psychiatric screening through conversational AI models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic EMR construction for clinical relevance

Multi-agent framework with hierarchical state machine

Generating large-scale diagnostic dialogue dataset PsyCoTalk

🔎 Similar Papers

Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges