🤖 AI Summary
This work addresses the challenges of data scarcity and unreliable self-reports—such as concealment or exaggeration—in initial psychiatric evaluations, which significantly compromise diagnostic accuracy. To this end, we propose the first honesty-aware multi-agent synthetic framework that generates an end-to-end, high-fidelity dataset encompassing self-reports, observer ratings, clinical interviews, and diagnostic formulations through a four-role workflow involving simulated patients, assessors, raters, and diagnosticians. Built upon the DAIC-WOZ dataset, our framework integrates multi-agent simulation, chain-of-thought reasoning, clinical scale–driven dialogue generation, and large language model (LLM) evaluation to controllably model patient deception. The resulting corpus spans multiple disorder types and severity levels and has been validated through both human and LLM assessments for diagnostic consistency, clinical authenticity, and fidelity in modeling deceptive behaviors, thereby supporting honesty-aware evaluation and adaptive psychiatric dialogue systems.
📝 Abstract
Data scarcity and unreliable self-reporting -- such as concealment or exaggeration -- pose fundamental challenges to psychiatric intake and assessment. We propose a multi-agent synthesis framework that explicitly models patient deception to generate high-fidelity, publicly releasable synthetic psychiatric intake records. Starting from DAIC-WOZ interviews, we construct enriched patient profiles and simulate a four-role workflow: a \emph{Patient} completes self-rated scales and participates in a semi-structured interview under a topic-dependent honesty state; an \emph{Assessor} selects instruments based on demographics and chief complaints; an \emph{Evaluator} conducts the interview grounded in rater-administered scales, tracks suspicion, and completes ratings; and a \emph{Diagnostician} integrates all evidence into a diagnostic summary. Each case links the patient profile, self-rated and rater-administered responses, interview transcript, diagnostic summary, and honesty state. We validate the framework through four complementary evaluations: diagnostic consistency and severity grading, chain-of-thought ablations, human evaluation of clinical realism and dishonesty modeling, and LLM-based comparative evaluation. The resulting corpus spans multiple disorders and severity levels, enabling controlled study of dishonesty-aware psychiatric assessment and the training and evaluation of adaptive dialogue agents.