🤖 AI Summary
In mental health, privacy constraints severely limit access to personalized clinical data, hindering the development of robust diagnostic and intervention models. To address this, we propose a generative training framework based on self-play. Our approach introduces two key innovations: (1) a novel dual-dimensional symptom encoder—capturing both cognitive and behavioral representations—coupled with a dynamic intent-calibration symptom decoder, enabling a bidirectional, closed-loop self-play mechanism that jointly simulates patient behavior and therapist decision-making; and (2) flexible domain-adaptive fine-tuning support for multiple foundation models, including GPT-3.5 and Llama-3-8B. Evaluated on six mental health and biomedical QA benchmarks, our method consistently outperforms six state-of-the-art models—including GPT-4o—achieving significant gains in personalized diagnostic accuracy and intervention recommendation quality. This work establishes a scalable, privacy-preserving paradigm for AI-driven psychological assessment and intervention.
📝 Abstract
Mental health disorders are one of the most serious diseases in the world. Most people with such a disease lack access to adequate care, which highlights the importance of training models for the diagnosis and treatment of mental health disorders. However, in the mental health domain, privacy concerns limit the accessibility of personalized treatment data, making it challenging to build powerful models. In this paper, we introduce MentalArena, a self-play framework to train language models by generating domain-specific personalized data, where we obtain a better model capable of making a personalized diagnosis and treatment (as a therapist) and providing information (as a patient). To accurately model human-like mental health patients, we devise Symptom Encoder, which simulates a real patient from both cognition and behavior perspectives. To address intent bias during patient-therapist interactions, we propose Symptom Decoder to compare diagnosed symptoms with encoded symptoms, and dynamically manage the dialogue between patient and therapist according to the identified deviations. We evaluated MentalArena against 6 benchmarks, including biomedicalQA and mental health tasks, compared to 6 advanced models. Our models, fine-tuned on both GPT-3.5 and Llama-3-8b, significantly outperform their counterparts, including GPT-4o. We hope that our work can inspire future research on personalized care. Code is available in https://github.com/Scarelette/MentalArena/tree/main