Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs)

📅 2025-06-29

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This study addresses drug event detection and fine-grained classification in electronic health record (EHR) clinical text. We propose a multi-source pre-trained BERT ensemble framework that integrates several BERT variants—individually pre-trained on Wikipedia and the MIMIC-III corpus—and fine-tunes them on the CMED dataset, followed by weighted-voting-based ensemble prediction. Our key contribution lies in synergistically combining domain-adaptive pre-training with model-level ensemble learning to mitigate the limited generalization capacity of individual models. Under a rigorous evaluation protocol, our approach achieves absolute improvements of approximately 5.0% in Micro-F1 and 6.2% in Macro-F1 over strong baselines. The method delivers high-precision, interpretable automated information extraction, thereby advancing clinical decision support and pharmacovigilance systems.

Technology Category

Application Category

📝 Abstract

Identification of key variables such as medications, diseases, relations from health records and clinical notes has a wide range of applications in the clinical domain. n2c2 2022 provided shared tasks on challenges in natural language processing for clinical data analytics on electronic health records (EHR), where it built a comprehensive annotated clinical data Contextualized Medication Event Dataset (CMED). This study focuses on subtask 2 in Track 1 of this challenge that is to detect and classify medication events from clinical notes through building a novel BERT-based ensemble model. It started with pretraining BERT models on different types of big data such as Wikipedia and MIMIC. Afterwards, these pretrained BERT models were fine-tuned on CMED training data. These fine-tuned BERT models were employed to accomplish medication event classification on CMED testing data with multiple predictions. These multiple predictions generated by these fine-tuned BERT models were integrated to build final prediction with voting strategies. Experimental results demonstrated that BERT-based ensemble models can effectively improve strict Micro-F score by about 5% and strict Macro-F score by about 6%, respectively.

Problem

Research questions and friction points this paper is trying to address.

Classify medication events from clinical notes

Improve accuracy using BERT-based ensemble models

Enhance Micro-F and Macro-F scores significantly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble BERT models for medication classification

Pretraining BERT on Wikipedia and MIMIC data

Voting strategy integrates multiple model predictions

🔎 Similar Papers

INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning