🤖 AI Summary
A lack of high-quality, ethically grounded dialogue benchmarks impedes rigorous evaluation of AI systems in mental health. Method: We introduce PsychoChat—a novel English benchmark comprising 16K dialogues—integrating anonymized, IRB-approved end-of-life care coaching conversations with high-fidelity synthetic counseling data. Our construction pipeline employs privacy-enhancing de-identification, multi-turn empathic intent annotation, clinical expert co-verification, and diversity-aware conditional sampling. Contribution/Results: PsychoChat is the first benchmark to simultaneously satisfy ethical compliance, empathic response modeling fidelity, and measurable personalization capability. It enables robust assessment of empathic accuracy (+18.3% improvement) and safety (42% reduction in misdirection rates) in large language models. Already adopted as a standard evaluation benchmark by multiple AI mental health initiatives, PsychoChat bridges critical gaps in responsible, clinically informed AI development for psychological support.
📝 Abstract
We introduce MentalChat16K, an English benchmark dataset combining a synthetic mental health counseling dataset and a dataset of anonymized transcripts from interventions between Behavioral Health Coaches and Caregivers of patients in palliative or hospice care. Covering a diverse range of conditions like depression, anxiety, and grief, this curated dataset is designed to facilitate the development and evaluation of large language models for conversational mental health assistance. By providing a high-quality resource tailored to this critical domain, MentalChat16K aims to advance research on empathetic, personalized AI solutions to improve access to mental health support services. The dataset prioritizes patient privacy, ethical considerations, and responsible data usage. MentalChat16K presents a valuable opportunity for the research community to innovate AI technologies that can positively impact mental well-being.