🤖 AI Summary
This study addresses the challenges of extracting and standardizing drug-related information (e.g., dosage, route of administration, strength, adverse reactions) from clinical text. To this end, we propose the first multi-LLM stacking/voting ensemble framework for end-to-end drug entity recognition, normalization, and cross-knowledge-base mapping—supporting SNOMED-CT, BNF, dm+d, and ICD. Our method integrates domain-finetuned BERT models (BioClinicalBERT, PubMedBERT), rule-augmented entity linking, and collaborative reasoning from multiple LLMs (LLaMA, ChatGLM). Evaluated on both general and clinical-specific benchmarks, our approach achieves superior F1-scores over state-of-the-art medical NER models. We release an open-source toolkit enabling lightweight deployment and have successfully integrated it into real-world hospital NLP workflows, significantly enhancing the structural fidelity and semantic interoperability of drug information extraction.
📝 Abstract
Medication Extraction and Mining play an important role in healthcare NLP research due to its practical applications in hospital settings, such as their mapping into standard clinical knowledge bases (SNOMED-CT, BNF, etc.). In this work, we investigate state-of-the-art LLMs in text mining tasks on medications and their related attributes such as dosage, route, strength, and adverse effects. In addition, we explore different ensemble learning methods ( extsc{Stack-Ensemble} and extsc{Voting-Ensemble}) to augment the model performances from individual LLMs. Our ensemble learning result demonstrated better performances than individually fine-tuned base models BERT, RoBERTa, RoBERTa-L, BioBERT, BioClinicalBERT, BioMedRoBERTa, ClinicalBERT, and PubMedBERT across general and specific domains. Finally, we build up an entity linking function to map extracted medical terminologies into the SNOMED-CT codes and the British National Formulary (BNF) codes, which are further mapped to the Dictionary of Medicines and Devices (dm+d), and ICD. Our model's toolkit and desktop applications are publicly available (at url{https://github.com/HECTA-UoM/ensemble-NER}).