MentalSeek-Dx: Towards Progressive Hypothetico-Deductive Reasoning for Real-world Psychiatric Diagnosis

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited ecological validity and absence of fine-grained hierarchical diagnostic supervision in current large language models for psychiatric diagnosis, which hinder their applicability in real-world clinical settings. To bridge this gap, the authors introduce MentalDx Bench—the first benchmark tailored to authentic clinical environments for mental disorder diagnosis—and propose MentalSeek-Dx, a novel model that integrates clinical hypothetico-deductive reasoning into large-model training. By leveraging supervised trajectory construction and curriculum-based reinforcement learning, the approach enables precise, hierarchical diagnosis. Built upon a 14B-parameter medical foundation model and trained on electronic health records annotated with ICD-11 criteria, MentalSeek-Dx significantly outperforms 18 state-of-the-art models on MentalDx Bench, demonstrating exceptional performance in hierarchical disease classification and validating both its clinical reliability and methodological innovation.

Technology Category

Application Category

📝 Abstract
Mental health disorders represent a burgeoning global public health challenge. While Large Language Models (LLMs) have demonstrated potential in psychiatric assessment, their clinical utility is severely constrained by benchmarks that lack ecological validity and fine-grained diagnostic supervision. To bridge this gap, we introduce \textbf{MentalDx Bench}, the first benchmark dedicated to disorder-level psychiatric diagnosis within real-world clinical settings. Comprising 712 de-identified electronic health records annotated by board-certified psychiatrists under ICD-11 guidelines, the benchmark covers 76 disorders across 16 diagnostic categories. Evaluation of 18 LLMs reveals a critical \textit{paradigm misalignment}: strong performance at coarse diagnostic categorization contrasts with systematic failure at disorder-level diagnosis, underscoring a gap between pattern-based modeling and clinical hypothetico-deductive reasoning. In response, we propose \textbf{MentalSeek-Dx}, a medical-specialized LLM trained to internalize this clinical reasoning process through supervised trajectory construction and curriculum-based reinforcement learning. Experiments on MentalDx Bench demonstrate that MentalSeek-Dx achieves state-of-the-art (SOTA) performance with only 14B parameters, establishing a clinically grounded framework for reliable psychiatric diagnosis.
Problem

Research questions and friction points this paper is trying to address.

psychiatric diagnosis
ecological validity
disorder-level diagnosis
hypothetico-deductive reasoning
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

hypothetico-deductive reasoning
psychiatric diagnosis
clinical benchmark
large language models
reinforcement learning
🔎 Similar Papers
No similar papers found.
X
Xiao Sun
School of Computer Science, Chongqing University, Chongqing, China
Yuming Yang
Yuming Yang
Fudan University
Natural Language ProcessingLarge Language Models
Junnan Zhu
Junnan Zhu
Institute of Automation Chinese Academy of Sciences
Natural Language Processing
J
Jiang Zhong
School of Computer Science, Chongqing University, Chongqing, China
X
Xinyu Zhou
The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
K
Kaiwen Wei
School of Computer Science, Chongqing University, Chongqing, China