🤖 AI Summary
High-quality, privacy-compliant psychotherapy dialogue data are scarce, hindering the fine-tuning and deployment of open-source large language models (LLMs) in mental health counseling. To address this, we propose a multi-agent collaborative generation framework that formalizes therapy dialogues as a cognitive behavioral therapy (CBT)-guided pipeline of specialized subtasks, each executed by dedicated LLM agents. We further design a comprehensive nine-dimensional evaluation framework integrating automated metrics and expert human assessment. The generated dialogues significantly outperform baselines in quality, diversity, and therapeutic alignment, achieving a 77.2% expert preference rate. Fine-tuning LLMs on this data yields substantial improvements: +6.3% in CTRS (Cognitive Therapy Rating Scale) general counseling skills and +7.3% in CBT-specific competencies. This work advances data-efficient, clinically grounded LLM adaptation for psychotherapy support.
📝 Abstract
The growing demand for scalable psychological counseling highlights the need for fine-tuning open-source Large Language Models (LLMs) with high-quality, privacy-compliant data, yet such data remains scarce. Here we introduce MAGneT, a novel multi-agent framework for synthetic psychological counseling session generation that decomposes counselor response generation into coordinated sub-tasks handled by specialized LLM agents, each modeling a key psychological technique. Unlike prior single-agent approaches, MAGneT better captures the structure and nuance of real counseling. In addition, we address inconsistencies in prior evaluation protocols by proposing a unified evaluation framework integrating diverse automatic and expert metrics. Furthermore, we expand the expert evaluations from four aspects of counseling in previous works to nine aspects, enabling a more thorough and robust assessment of data quality. Empirical results show that MAGneT significantly outperforms existing methods in quality, diversity, and therapeutic alignment of the generated counseling sessions, improving general counseling skills by 3.2% and CBT-specific skills by 4.3% on average on cognitive therapy rating scale (CTRS). Crucially, experts prefer MAGneT-generated sessions in 77.2% of cases on average across all aspects. Moreover, fine-tuning an open-source model on MAGneT-generated sessions shows better performance, with improvements of 6.3% on general counseling skills and 7.3% on CBT-specific skills on average on CTRS over those fine-tuned with sessions generated by baseline methods. We also make our code and data public.