Towards Privacy-Preserving Mental Health Support with Large Language Models

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the scarcity of authentic counseling data and privacy sensitivities hindering the deployment of large language models in mental health support. To this end, the authors propose MindChat, a framework that generates high-fidelity synthetic dialogues—termed MindCorpus—through multi-agent role-playing augmented with a dual-loop feedback mechanism comprising turn-level critique-and-revision and conversation-level strategy optimization. The model is then trained using an integrated privacy-preserving pipeline combining LoRA-based parameter-efficient fine-tuning, federated learning, and differential privacy. Experimental results demonstrate that MindChat matches or rivals existing baselines in both automatic and human evaluations while substantially reducing vulnerability to membership inference attacks, thereby validating the efficacy of synthetic data and its capacity to achieve a favorable trade-off between utility and privacy preservation.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have shown promise for mental health support, yet training such models is constrained by the scarcity and sensitivity of real counseling dialogues. In this article, we present MindChat, a privacy-preserving LLM for mental health support, together with MindCorpus, a synthetic multi-turn counseling dataset constructed via a multi-agent role-playing framework. To synthesize high-quality counseling data, the developed dialogue-construction framework employs a dual closed-loop feedback design to integrate psychological expertise and counseling techniques through role-playing: (i) turn-level critique-and-revision to improve coherence and counseling appropriateness within a session, and (ii) session-level strategy refinement to progressively enrich counselor behaviors across sessions. To mitigate privacy risks under decentralized data ownership, we fine-tune the base model using federated learning with parameter-efficient LoRA adapters and incorporate differentially private optimization to reduce membership and memorization risks. Experiments on synthetic-data quality assessment and counseling capability evaluation show that MindCorpus improves training effectiveness and that MindChat is competitive with existing general and counseling-oriented LLM baselines under both automatic LLM-judge and human evaluation protocols, while exhibiting reduced privacy leakage under membership inference attacks.

Problem

Research questions and friction points this paper is trying to address.

privacy-preserving

large language model

mental health support

synthetic counseling data

data scarcity

Innovation

Methods, ideas, or system contributions that make the work stand out.

privacy-preserving

synthetic counseling data

multi-agent role-playing