MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) in mental health face critical challenges including incoherent reasoning, hallucination susceptibility, and poor clinical alignment. To address these, we propose Mindora—a novel framework featuring (1) MentraBench, the first five-dimensional benchmark for reasoning quality in mental health, evaluating concision, coherence, hallucination resistance, task comprehension, and internal consistency; and (2) a hybrid supervised fine-tuning–reinforcement learning (SFT-RL) post-training method incorporating inconsistency-detection rewards and structured reasoning trajectory generation to enable stepwise, verifiable, and clinically grounded inference. Experiments demonstrate that Mindora achieves top-ranked performance on MentraBench and significantly outperforms 20 baseline models on complex clinical tasks—including clinical abstraction, diagnostic validation, and intervention planning—establishing a trustworthy, clinically informed reasoning paradigm for mental health AI.

Technology Category

Application Category

📝 Abstract
Mental health disorders affect hundreds of millions globally, and the Web now serves as a primary medium for accessing support, information, and assessment. Large language models (LLMs) offer scalable and accessible assistance, yet their deployment in mental-health settings remains risky when their reasoning is incomplete, inconsistent, or ungrounded. Existing psychological LLMs emphasize emotional understanding or knowledge recall but overlook the step-wise, clinically aligned reasoning required for appraisal, diagnosis, intervention planning, abstraction, and verification. To address these issues, we introduce MentraSuite, a unified framework for advancing reliable mental-health reasoning. We propose MentraBench, a comprehensive benchmark spanning five core reasoning aspects, six tasks, and 13 datasets, evaluating both task performance and reasoning quality across five dimensions: conciseness, coherence, hallucination avoidance, task understanding, and internal consistency. We further present Mindora, a post-trained model optimized through a hybrid SFT-RL framework with an inconsistency-detection reward to enforce faithful and coherent reasoning. To support training, we construct high-quality trajectories using a novel reasoning trajectory generation strategy, that strategically filters difficult samples and applies a structured, consistency-oriented rewriting process to produce concise, readable, and well-balanced trajectories. Across 20 evaluated LLMs, Mindora achieves the highest average performance on MentraBench and shows remarkable performances in reasoning reliability, demonstrating its effectiveness for complex mental-health scenarios.
Problem

Research questions and friction points this paper is trying to address.

Develops a framework for reliable mental health reasoning in LLMs
Addresses incomplete and inconsistent reasoning in mental health applications
Enhances step-wise clinical reasoning for assessment and diagnosis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-trained LLM with hybrid SFT-RL framework
Inconsistency-detection reward for faithful reasoning
Novel trajectory generation with filtering and rewriting
🔎 Similar Papers
No similar papers found.
Mengxi Xiao
Mengxi Xiao
Wuhan University
PsychologyLarge Language Model
Kailai Yang
Kailai Yang
The University of Manchester
Natural Language ProcessingLarge Language Models
P
Pengde Zhao
School of Computer Science, Wuhan University
E
Enze Zhang
School of Artificial Intelligence, Wuhan University
Z
Ziyan Kuang
Center for Language and Information Research, Wuhan University
Z
Zhiwei Liu
The University of Manchester
W
Weiguang Han
School of Computer Science, Wuhan University
Shu Liao
Shu Liao
Center for Language and Information Research, Wuhan University
L
Lianting Huang
Mount Holyoke College
Jinpeng Hu
Jinpeng Hu
Hefei University of Technology
natural language processingnamed entity recognitionsummarization
M
Min Peng
School of Artificial Intelligence, Wuhan University
Qianqian Xie
Qianqian Xie
Wuhan University
NLPLLM
Sophia Ananiadou
Sophia Ananiadou
Professor, Computer Science, Manchester University, National Centre for Text Mining
Natural Language ProcessingText MiningComputational LinguisticsArtificial Intelligence