MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A lack of specialized evaluation benchmarks impedes the assessment of large language models (LLMs) in Chinese medical ethics. Method: We introduce CMEval—the first domain-specific benchmark for this purpose—featuring a three-tier difficulty taxonomy grounded in expert consensus (blatant violations, priority dilemmas, equilibrium dilemmas), a dual-dimensional evaluation framework (knowledge mastery and scenario-based application), and three high-quality, expert-annotated Chinese datasets. Evaluation employs structured prompting and multi-granularity scoring to quantify ethical reasoning capability. Contribution/Results: CMEval fills a critical gap in Chinese medical ethics AI evaluation. Comprehensive assessment of 12 mainstream Chinese and English LLMs reveals a pronounced weakness in resolving equilibrium dilemmas—complex trade-off scenarios requiring balanced moral judgment. The benchmark is fully reproducible and accompanied by targeted alignment strategies, thereby advancing the ethical safety and responsible deployment of medical LLMs.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) demonstrate significant potential in advancing medical applications, yet their capabilities in addressing medical ethics challenges remain underexplored. This paper introduces MedEthicEval, a novel benchmark designed to systematically evaluate LLMs in the domain of medical ethics. Our framework encompasses two key components: knowledge, assessing the models' grasp of medical ethics principles, and application, focusing on their ability to apply these principles across diverse scenarios. To support this benchmark, we consulted with medical ethics researchers and developed three datasets addressing distinct ethical challenges: blatant violations of medical ethics, priority dilemmas with clear inclinations, and equilibrium dilemmas without obvious resolutions. MedEthicEval serves as a critical tool for understanding LLMs' ethical reasoning in healthcare, paving the way for their responsible and effective use in medical contexts.
Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs on Chinese medical ethics understanding.
Assesses models' application of ethics in diverse scenarios.
Develops datasets for ethical challenges in healthcare.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces MedEthicEval for ethical evaluation
Uses datasets for diverse ethical challenges
Assesses knowledge and application of ethics
🔎 Similar Papers
No similar papers found.