Multilingual Clinical NER for Diseases and Medications Recognition in Cardiology Texts using BERT Embeddings

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the suboptimal performance of disease and drug named entity recognition (NER) in low-resource cardiovascular clinical texts—specifically in Spanish and Italian. We propose BERT-based multilingual and monolingual deeply contextualized embedding methods, jointly optimizing the model on English, Spanish, and Italian clinical case notes. Training and evaluation are conducted within the BioASQ MultiCardioNER shared task framework to enhance cross-lingual biomedical entity extraction. Our approach fills a critical gap in cardiovascular NER research for low-resource languages: it achieves an F1-score of 77.88% for disease recognition in Spanish and 92.09%, 91.74%, and 88.90% for drug recognition in Spanish, English, and Italian, respectively—surpassing both the mean and median scores of the official benchmark leaderboard across all languages.

Technology Category

Application Category

📝 Abstract
The rapidly increasing volume of electronic health record (EHR) data underscores a pressing need to unlock biomedical knowledge from unstructured clinical texts to support advancements in data-driven clinical systems, including patient diagnosis, disease progression monitoring, treatment effects assessment, prediction of future clinical events, etc. While contextualized language models have demonstrated impressive performance improvements for named entity recognition (NER) systems in English corpora, there remains a scarcity of research focused on clinical texts in low-resource languages. To bridge this gap, our study aims to develop multiple deep contextual embedding models to enhance clinical NER in the cardiology domain, as part of the BioASQ MultiCardioNER shared task. We explore the effectiveness of different monolingual and multilingual BERT-based models, trained on general domain text, for extracting disease and medication mentions from clinical case reports written in English, Spanish, and Italian. We achieved an F1-score of 77.88% on Spanish Diseases Recognition (SDR), 92.09% on Spanish Medications Recognition (SMR), 91.74% on English Medications Recognition (EMR), and 88.9% on Italian Medications Recognition (IMR). These results outperform the mean and median F1 scores in the test leaderboard across all subtasks, with the mean/median values being: 69.61%/75.66% for SDR, 81.22%/90.18% for SMR, 89.2%/88.96% for EMR, and 82.8%/87.76% for IMR.
Problem

Research questions and friction points this paper is trying to address.

Developing multilingual clinical NER for cardiology texts
Extracting disease and medication mentions from clinical reports
Addressing low-resource language challenges in clinical NER
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual BERT models for clinical entity recognition
Deep contextual embeddings for cardiology text analysis
Cross-lingual disease and medication extraction system
M
Manuela Daniela Danu
Advanta, Siemens SRL, 15 Noiembrie Bvd, 500097 Brasov, Romania
G
George Marica
Advanta, Siemens SRL, 15 Noiembrie Bvd, 500097 Brasov, Romania
C
Constantin Suciu
Advanta, Siemens SRL, 15 Noiembrie Bvd, 500097 Brasov, Romania
Lucian Mihai Itu
Lucian Mihai Itu
Unknown affiliation
Computational Fluid DynamicsCoronary CirculationReduced-order modelingMachine learningParameter estimation methods
Oladimeji Farri
Oladimeji Farri
Digital Technology and Innovation, Siemens Healthineers, 755 College Rd E, 08540 Princeton, NJ, United States