Evaluating Open-Weight Large Language Models for Structured Data Extraction from Narrative Medical Reports Across Multiple Use Cases and Languages

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical narrative reports—such as pathology and radiology notes across diverse languages (e.g., Chinese, English, Spanish) and institutions—pose significant challenges for structured data extraction, particularly in multilingual, multi-disease, and cross-institutional settings. Method: We systematically evaluated 15 open-source large language models (LLMs) on six disease domains using six prompting strategies—zero-shot, one-shot, few-shot, chain-of-thought, self-consistency, and prompt graphs—within real-world multinational clinical environments. Rigorous evaluation employed macro-F1 scores, consensus-based ranking aggregation, and linear mixed-effects modeling to account for hierarchical data structure. Contribution/Results: Medium- and small-scale general-purpose LLMs achieved performance comparable to large models; prompt graphs and few-shot prompting yielded the most substantial gains. Task-specific characteristics outweighed parameter count in predictive impact. The best-performing model attained macro-F1 scores approaching inter-annotator agreement, demonstrating strong cross-lingual, cross-disease, and cross-institutional robustness and scalability—establishing an efficient, low-cost, open-source solution for clinical text structuring.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are increasingly used to extract structured information from free-text clinical records, but prior work often focuses on single tasks, limited models, and English-language reports. We evaluated 15 open-weight LLMs on pathology and radiology reports across six use cases, colorectal liver metastases, liver tumours, neurodegenerative diseases, soft-tissue tumours, melanomas, and sarcomas, at three institutes in the Netherlands, UK, and Czech Republic. Models included general-purpose and medical-specialised LLMs of various sizes, and six prompting strategies were compared: zero-shot, one-shot, few-shot, chain-of-thought, self-consistency, and prompt graph. Performance was assessed using task-appropriate metrics, with consensus rank aggregation and linear mixed-effects models quantifying variance. Top-ranked models achieved macro-average scores close to inter-rater agreement across tasks. Small-to-medium general-purpose models performed comparably to large models, while tiny and specialised models performed worse. Prompt graph and few-shot prompting improved performance by ~13%. Task-specific factors, including variable complexity and annotation variability, influenced results more than model size or prompting strategy. These findings show that open-weight LLMs can extract structured data from clinical reports across diseases, languages, and institutions, offering a scalable approach for clinical data curation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating open-weight LLMs for structured data extraction from medical narratives
Assessing performance across multiple clinical use cases and languages
Comparing model sizes and prompting strategies for clinical data curation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated 15 open-weight LLMs across multiple medical domains
Compared six prompting strategies including graph and few-shot
Achieved performance close to human inter-rater agreement
🔎 Similar Papers
No similar papers found.
D
D. J. Spaanderman
Department of Radiology and Nuclear Medicine, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands
K
Karthik Prathaban
Department of Pathology and Clinical Bioinformatics, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands
P
Petr Zelina
Faculty of Informatics, Masaryk University
K
Kaouther Mouheb
Department of Radiology and Nuclear Medicine, Alzheimer Center Erasmus MC, University Medical Center Rotterdam, the Netherlands
L
Lukáš Hejtmánek
Institute of Computer Science, Masaryk University
M
Matthew Marzetti
Department of Medical Physics, Leeds Teaching Hospitals NHS Trust, UK; Leeds Biomedical Research Centre, University of Leeds, UK
A
A. W. Schurink
Department of Surgical Oncology and Gastrointestinal Surgery, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands
D
Damian Chan
Department of Surgical Oncology and Gastrointestinal Surgery, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands
R
Ruben Niemantsverdriet
Department of Radiology and Nuclear Medicine, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands
F
Frederik Hartmann
Department of Radiology and Nuclear Medicine, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands
Zhen Qian
Zhen Qian
United Imaging
Medical imagingmedical image analysiscomputational biology
M
M. Thomeer
Department of Radiology and Nuclear Medicine, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands
P
Petr Holub
Department of Radiology and Nuclear Medicine, Alzheimer Center Erasmus MC, University Medical Center Rotterdam, the Netherlands
F
Farhan Akram
Department of Pathology and Clinical Bioinformatics, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands
F
F. Wolters
Department of Radiology and Nuclear Medicine, Alzheimer Center Erasmus MC, University Medical Center Rotterdam, the Netherlands
M
Meike W. Vernooij
Department of Radiology and Nuclear Medicine, Alzheimer Center Erasmus MC, University Medical Center Rotterdam, the Netherlands
C
Cornelis Verhoef
Department of Surgical Oncology and Gastrointestinal Surgery, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands
E
E. Bron
Department of Radiology and Nuclear Medicine, Alzheimer Center Erasmus MC, University Medical Center Rotterdam, the Netherlands
V
Vít Nováček
Bioinformatics Research Group, Masaryk Memorial Cancer Institute
D
D. Grunhagen
Department of Surgical Oncology and Gastrointestinal Surgery, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands
W
W. J. Niessen
Faculty of Medical Sciences, University of Groningen, Groningen, the Netherlands
M
M. Starmans
Department of Radiology and Nuclear Medicine, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands
S
Stefan Klein
Department of Radiology and Nuclear Medicine, Erasmus MC Cancer Institute, University Medical Center Rotterdam, the Netherlands