PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of current ethical evaluations of large language models (LLMs) in mental health applications, which overly rely on refusal rates and fail to capture clinically essential qualities such as empathy and professional conduct. To bridge this gap, the authors introduce PsychEthicsBench—the first multidimensional evaluation benchmark grounded in Australian psychological and psychiatric ethics guidelines. It systematically assesses models’ ethical knowledge and behavioral responses through multiple-choice and open-ended tasks, augmented with fine-grained human annotations. Evaluations across 14 models reveal a significant disconnect between refusal rates and actual ethical behavior, and further demonstrate that domain-specific fine-tuning can sometimes undermine ethical consistency, thereby highlighting the inadequacy of conventional safety metrics.

Technology Category

Application Category

📝 Abstract
The increasing integration of large language models (LLMs) into mental health applications necessitates robust frameworks for evaluating professional safety alignment. Current evaluative approaches primarily rely on refusal-based safety signals, which offer limited insight into the nuanced behaviors required in clinical practice. In mental health, clinically inadequate refusals can be perceived as unempathetic and discourage help-seeking. To address this gap, we move beyond refusal-centric metrics and introduce \texttt{PsychEthicsBench}, the first principle-grounded benchmark based on Australian psychology and psychiatry guidelines, designed to evaluate LLMs'ethical knowledge and behavioral responses through multiple-choice and open-ended tasks with fine-grained ethicality annotations. Empirical results across 14 models reveal that refusal rates are poor indicators of ethical behavior, revealing a significant divergence between safety triggers and clinical appropriateness. Notably, we find that domain-specific fine-tuning can degrade ethical robustness, as several specialized models underperform their base backbones in ethical alignment. PsychEthicsBench provides a foundation for systematic, jurisdiction-aware evaluation of LLMs in mental health, encouraging more responsible development in this domain.
Problem

Research questions and friction points this paper is trying to address.

large language models
mental health ethics
safety alignment
clinical appropriateness
ethical evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

PsychEthicsBench
ethical alignment
mental health LLMs
refusal-centric evaluation
jurisdiction-aware benchmark
🔎 Similar Papers
No similar papers found.
Yaling Shen
Yaling Shen
Monash University
AI in HealthcareNatural Language Processing
S
Stephanie Fong
Monash University
Y
Yiwen Jiang
Monash University
Zimu Wang
Zimu Wang
Tsinghua University
recommendation
Feilong Tang
Feilong Tang
Monash University
Computer VisionFoundation ModelMedical Image Analysis
Q
Qingyang Xu
Monash University
X
Xiangyu Zhao
Monash University
Z
Zhongxing Xu
Monash University
J
Jiahe Liu
Monash University
Jinpeng Hu
Jinpeng Hu
Hefei University of Technology
natural language processingnamed entity recognitionsummarization
D
Dominic Dwyer
Monash University
Z
Zongyuan Ge
Monash University