Leveraging Open-Source Large Language Models for Clinical Information Extraction in Resource-Constrained Settings

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study addresses the challenges of clinical information extraction from medical reports in resource-constrained settings—namely, unstructured text, domain-specific linguistic complexity, opacity of proprietary models, and data privacy risks—by proposing LLM-Extractinator, an open-source framework. Methodologically, it conducts the first systematic zero-shot evaluation of open-weight large language models—including Phi-4-14B, Qwen-2.5-14B, DeepSeek-R1-14B, and Llama-3.3—on Dutch clinical texts, employing localized prompting and zero-shot inference to avoid translation-induced degradation. Results show that 14B-parameter models (e.g., Phi-4, Qwen-2.5, DeepSeek-R1) achieve near-optimal performance, substantially outperforming smaller models, while 70B models yield marginal gains at prohibitive computational cost. The core contribution is the empirical validation of lightweight, open-source LLMs for real-world clinical NLP tasks, demonstrating their efficacy, feasibility, and privacy-preserving potential. The framework and evaluation benchmark are publicly released to foster reproducible, privacy-aware clinical AI deployment.

Technology Category

Application Category

📝 Abstract

Medical reports contain rich clinical information but are often unstructured and written in domain-specific language, posing challenges for information extraction. While proprietary large language models (LLMs) have shown promise in clinical natural language processing, their lack of transparency and data privacy concerns limit their utility in healthcare. This study therefore evaluates nine open-source generative LLMs on the DRAGON benchmark, which includes 28 clinical information extraction tasks in Dutch. We developed exttt{llm_extractinator}, a publicly available framework for information extraction using open-source generative LLMs, and used it to assess model performance in a zero-shot setting. Several 14 billion parameter models, Phi-4-14B, Qwen-2.5-14B, and DeepSeek-R1-14B, achieved competitive results, while the bigger Llama-3.3-70B model achieved slightly higher performance at greater computational cost. Translation to English prior to inference consistently degraded performance, highlighting the need of native-language processing. These findings demonstrate that open-source LLMs, when used with our framework, offer effective, scalable, and privacy-conscious solutions for clinical information extraction in low-resource settings.

Problem

Research questions and friction points this paper is trying to address.

Extracting clinical data from unstructured medical reports

Addressing privacy and transparency in clinical NLP with open-source LLMs

Evaluating LLMs for Dutch clinical tasks in resource-limited settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source LLMs for clinical extraction

Native-language processing framework

Privacy-conscious scalable solution

🔎 Similar Papers

No similar papers found.