🤖 AI Summary
Current LLM-based psychological support methods lack clinical grounding: diagnostic reasoning fails to align with DSM/ICD criteria, and therapeutic strategies are overly simplistic, unable to accommodate diverse evidence-based modalities such as CBT and ACT. To address these limitations, we propose PsyLLM—the first clinically aligned large language model for mental health support. Our approach comprises: (1) an automated clinical data synthesis pipeline integrating diagnostic guideline constraints with cross-therapeutic reasoning prompt engineering; (2) a novel four-dimensional evaluation benchmark assessing comprehensiveness, clinical professionalism, factual authenticity, and safety; and (3) end-to-end joint modeling of diagnosis and multi-modal treatment planning. Evaluated on our expert-curated, authoritative benchmark, PsyLLM achieves a 32% absolute improvement in diagnostic accuracy over prior SOTA models and attains a therapeutic recommendation professionalism score of 4.78/5.0—significantly surpassing conventional empathic-response paradigms.
📝 Abstract
Large language models (LLMs) hold significant potential for mental health support, capable of generating empathetic responses and simulating therapeutic conversations. However, existing LLM-based approaches often lack the clinical grounding necessary for real-world psychological counseling, particularly in explicit diagnostic reasoning aligned with standards like the DSM/ICD and incorporating diverse therapeutic modalities beyond basic empathy or single strategies. To address these critical limitations, we propose PsyLLM, the first large language model designed to systematically integrate both diagnostic and therapeutic reasoning for mental health counseling. To develop the PsyLLM, we propose a novel automated data synthesis pipeline. This pipeline processes real-world mental health posts, generates multi-turn dialogue structures, and leverages LLMs guided by international diagnostic standards (e.g., DSM/ICD) and multiple therapeutic frameworks (e.g., CBT, ACT, psychodynamic) to simulate detailed clinical reasoning processes. Rigorous multi-dimensional filtering ensures the generation of high-quality, clinically aligned dialogue data. In addition, we introduce a new benchmark and evaluation protocol, assessing counseling quality across four key dimensions: comprehensiveness, professionalism, authenticity, and safety. Our experiments demonstrate that PsyLLM significantly outperforms state-of-the-art baseline models on this benchmark.