LingLanMiDian: Systematic Evaluation of LLMs on TCM Knowledge and Clinical Reasoning

πŸ“… 2026-02-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing evaluation benchmarks for large Chinese medicine language models suffer from incomplete coverage and inconsistent scoring criteria, hindering fair assessment of models’ capabilities in traditional Chinese medicine (TCM) knowledge and clinical reasoning. To address this, this work proposes LingLanMiDian, a large-scale, expert-curated multitask benchmark encompassing four core tasks: knowledge recall, multi-hop reasoning, information extraction, and clinical decision-making. The framework introduces unified evaluation metrics, a synonym-tolerant clinical labeling protocol, and a challenging 400-question β€œHard” subset per task, reframing diagnostic and therapeutic recommendations as single-choice decision identification. Evaluated under a zero-shot paradigm, the benchmark is standardized and extensible. Assessments of 14 leading large language models reveal substantial gaps between current models and human experts in TCM commonsense understanding and reasoning, particularly on the Hard subsets.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) are advancing rapidly in medical NLP, yet Traditional Chinese Medicine (TCM) with its distinctive ontology, terminology, and reasoning patterns requires domain-faithful evaluation. Existing TCM benchmarks are fragmented in coverage and scale and rely on non-unified or generation-heavy scoring that hinders fair comparison. We present the LingLanMiDian (LingLan) benchmark, a large-scale, expert-curated, multi-task suite that unifies evaluation across knowledge recall, multi-hop reasoning, information extraction, and real-world clinical decision-making. LingLan introduces a consistent metric design, a synonym-tolerant protocol for clinical labels, a per-dataset 400-item Hard subset, and a reframing of diagnosis and treatment recommendation into single-choice decision recognition. We conduct comprehensive, zero-shot evaluations on 14 leading open-source and proprietary LLMs, providing a unified perspective on their strengths and limitations in TCM commonsense knowledge understanding, reasoning, and clinical decision support; critically, the evaluation on Hard subset reveals a substantial gap between current models and human experts in TCM-specialized reasoning. By bridging fundamental knowledge and applied reasoning through standardized evaluation, LingLan establishes a unified, quantitative, and extensible foundation for advancing TCM LLMs and domain-specific medical AI research. All evaluation data and code are available at https://github.com/TCMAI-BJTU/LingLan and http://tcmnlp.com.
Problem

Research questions and friction points this paper is trying to address.

Traditional Chinese Medicine
Large Language Models
Benchmark Evaluation
Clinical Reasoning
Medical NLP
Innovation

Methods, ideas, or system contributions that make the work stand out.

Traditional Chinese Medicine
Large Language Models
Benchmark Evaluation
Clinical Reasoning
Domain-Specific AI
πŸ”Ž Similar Papers
No similar papers found.
R
Rui Hua
Department of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
Yu Wei
Yu Wei
Georgia Institute of Technology
Data privacyApplied Cryptography
Z
Zixin Shu
Institute of Liver Diseases, Hubei Key Laboratory of the theory and application research of liver and kidney in traditional Chinese medicine, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, China
Kai Chang
Kai Chang
Center for Quantum Matter, School of Physics, Zhejiang University, Hangzhou 310058, China
Condensed Matter Physics
D
Dengying Yan
Department of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
J
Jianan Xia
Department of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
Z
Zeyu Liu
Department of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
H
Hui Zhu
Institute of Liver Diseases, Hubei Key Laboratory of the theory and application research of liver and kidney in traditional Chinese medicine, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, China
S
Shujie Song
Institute of Liver Diseases, Hubei Key Laboratory of the theory and application research of liver and kidney in traditional Chinese medicine, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, China
M
Mingzhong Xiao
Institute of Liver Diseases, Hubei Key Laboratory of the theory and application research of liver and kidney in traditional Chinese medicine, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, China
X
Xiaodong Li
Institute of Liver Diseases, Hubei Key Laboratory of the theory and application research of liver and kidney in traditional Chinese medicine, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, China
D
Dongmei Jia
Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, China
Z
Zhuye Gao
Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, China
Y
Yanyan Meng
Beijing Research Institute of Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
N
Naixuan Zhao
Beijing University of Chinese Medicine Third Affiliated Hospital, Beijing University of Chinese Medicine, Beijing 100029, China
Y
Yu Fu
School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
H
Haibin Yu
The First Affiliated Hospital, Henan University of Chinese Medicine, Zhengzhou, China
B
Benman Yu
The First Affiliated Hospital, Henan University of Chinese Medicine, Zhengzhou, China
Y
Yuanyuan Chen
The First Affiliated Hospital, Henan University of Chinese Medicine, Zhengzhou, China
Fei Dong
Fei Dong
PhD Candidate, Singapore University of Technology and Design
Natural Language ProcessingMachine LearningArtificial Inteligence
Z
Zhizhou Meng
School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
P
Pengcheng Yang
Tianjin Tasly Digital Chinese Medicine Technology Co., Ltd., Tianjin, China
S
Songxue Zhao
Tianjin Tasly Digital Chinese Medicine Technology Co., Ltd., Tianjin, China
L
Lijuan Pei
Tianjin Tasly Digital Chinese Medicine Technology Co., Ltd., Tianjin, China
Y
Yunhui Hu
Tianjin Tasly Digital Chinese Medicine Technology Co., Ltd., Tianjin, China