CDrugRed: A Chinese Drug Recommendation Dataset for Discharge Medications in Metabolic Diseases

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The scarcity of non-English real-world electronic health record (EHR) data—particularly Chinese—hinders the development of multilingual intelligent medication recommendation systems. Method: To address this, we introduce CMED-DR, the first publicly available Chinese EHR dataset tailored for discharge medication recommendation in metabolic diseases. It comprises 5,894 de-identified records from 3,190 patients, encompassing structured clinical information including demographics, medical history, diagnoses, and treatment details. Contribution/Results: CMED-DR fills a critical gap in non-English discharge medication recommendation resources. Leveraging it, we conduct supervised fine-tuning of large language models, achieving benchmark F1 and Jaccard scores of 0.5648 and 0.4477, respectively—demonstrating both task difficulty and dataset validity. This work establishes a foundational resource and methodological paradigm for interpretable, accurate cross-lingual clinical decision support research.

Technology Category

Application Category

📝 Abstract
Intelligent drug recommendation based on Electronic Health Records (EHRs) is critical for improving for improving the quality and efficiency of clinical decision-making. By leveraging large-scale patient data, drug recommendation systems can assist physicians in selecting the most appropriate medications according to a patient's medical history, diagnoses, laboratory results, and comorbidities. However, the advancement of such systems is significantly hampered by the scarcity of publicly available, real-world EHR datasets, particularly in languages other than English. In this work, we present CDrugRed, a first publicly available Chinese drug recommendation dataset focused on discharge medications for metabolic diseases. The dataset includes 5,894 de-identified records from 3,190 patients, containing comprehensive information such as patient demographics, medical history, clinical course, and discharge diagnoses. We assess the utility of CDrugRed by benchmarking several state-of-the-art large language models (LLMs) on the discharge medication recommendation task. Experimental results show that while supervised fine-tuning improves model performance, there remains substantial room for improvement, with the best model achieving the F1 score of 0.5648 and Jaccard score of 0.4477. This result highlights the complexity of the clinical drug recommendation task and establishes CDrugRed as a challenging and valuable resource for developing more robust and accurate drug recommendation systems. The dataset is publicly available to the research community under the data usage agreements at https://github.com/DUTIR-BioNLP/CDrugRed.
Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of public Chinese EHR datasets for drug recommendation
Developing discharge medication recommendation for metabolic diseases in Chinese
Evaluating LLMs' performance on clinical drug recommendation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Created first Chinese drug recommendation dataset
Benchmarked large language models on medication tasks
Used supervised fine-tuning to improve model performance
🔎 Similar Papers
No similar papers found.
Juntao Li
Juntao Li
Soochow University
Language ModelsText Generation
H
Haobin Yuan
College of Computer Science and Technology, Dalian University of Technology, Dalian, China
L
Ling Luo
College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Y
Yan Jiang
Department of Pharmacy, Second Affiliated Hospital of Dalian Medical University, Dalian, China
F
Fan Wang
Department of Pharmacy, Second Affiliated Hospital of Dalian Medical University, Dalian, China
P
Ping Zhang
Department of Pharmacy, Second Affiliated Hospital of Dalian Medical University, Dalian, China
H
Huiyi Lv
Department of Pharmacy, Second Affiliated Hospital of Dalian Medical University, Dalian, China
J
Jian Wang
College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Y
Yuanyuan Sun
College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Hongfei Lin
Hongfei Lin
DalianUniversity of Technology
natural language processing,sentimental analysistext miningsocial computing