Leveraging Open-Source Large Language Models for Clinical Information Extraction in Resource-Constrained Settings

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of clinical information extraction from medical reports in resource-constrained settings—namely, unstructured text, domain-specific linguistic complexity, opacity of proprietary models, and data privacy risks—by proposing LLM-Extractinator, an open-source framework. Methodologically, it conducts the first systematic zero-shot evaluation of open-weight large language models—including Phi-4-14B, Qwen-2.5-14B, DeepSeek-R1-14B, and Llama-3.3—on Dutch clinical texts, employing localized prompting and zero-shot inference to avoid translation-induced degradation. Results show that 14B-parameter models (e.g., Phi-4, Qwen-2.5, DeepSeek-R1) achieve near-optimal performance, substantially outperforming smaller models, while 70B models yield marginal gains at prohibitive computational cost. The core contribution is the empirical validation of lightweight, open-source LLMs for real-world clinical NLP tasks, demonstrating their efficacy, feasibility, and privacy-preserving potential. The framework and evaluation benchmark are publicly released to foster reproducible, privacy-aware clinical AI deployment.

Technology Category

Application Category

📝 Abstract
Medical reports contain rich clinical information but are often unstructured and written in domain-specific language, posing challenges for information extraction. While proprietary large language models (LLMs) have shown promise in clinical natural language processing, their lack of transparency and data privacy concerns limit their utility in healthcare. This study therefore evaluates nine open-source generative LLMs on the DRAGON benchmark, which includes 28 clinical information extraction tasks in Dutch. We developed exttt{llm_extractinator}, a publicly available framework for information extraction using open-source generative LLMs, and used it to assess model performance in a zero-shot setting. Several 14 billion parameter models, Phi-4-14B, Qwen-2.5-14B, and DeepSeek-R1-14B, achieved competitive results, while the bigger Llama-3.3-70B model achieved slightly higher performance at greater computational cost. Translation to English prior to inference consistently degraded performance, highlighting the need of native-language processing. These findings demonstrate that open-source LLMs, when used with our framework, offer effective, scalable, and privacy-conscious solutions for clinical information extraction in low-resource settings.
Problem

Research questions and friction points this paper is trying to address.

Extracting clinical data from unstructured medical reports
Addressing privacy and transparency in clinical NLP with open-source LLMs
Evaluating LLMs for Dutch clinical tasks in resource-limited settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source LLMs for clinical extraction
Native-language processing framework
Privacy-conscious scalable solution
🔎 Similar Papers
No similar papers found.
L
Luc Builtjes
Diagnostic Image Analysis Group, Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands.
J
Joeran Bosma
Diagnostic Image Analysis Group, Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands.
Mathias Prokop
Mathias Prokop
Professor of Radiology, Radboudumc
Computed tomographycomputer aided diagnosislung cancerstroke
Bram van Ginneken
Bram van Ginneken
Professor of Medical Image Analysis, Radboud University
Medical Image AnalysisMedical ImagingDeep LearningComputer-Aided Diagnosis
Alessa Hering
Alessa Hering
Radboud University Medical Center
Deep LearningImage RegistrationTumor Follow-UpLLM