LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accurately matching patients to clinical trials remains challenging due to semantic heterogeneity and structural complexity in electronic health records (EHRs) and eligibility criteria. Method: We propose a large language model (LLM)-driven retrieval-augmented generation (RAG) framework that jointly models multi-source EHR semantics and structured trial inclusion/exclusion criteria. Our approach integrates fine-tuned open-weight LLMs, structured prompt engineering, and optimized classification heads to deliver end-to-end, interpretable, and generalizable matching. Contribution/Results: This work introduces the first open-LLM–powered RAG paradigm for clinical trial matching—uniquely balancing logical traceability with cross-dataset generalization. Evaluated on four established benchmarks (n2c2, SIGIR, TREC 2021, TREC 2022), our method significantly outperforms TrialGPT, zero-shot baselines, and the closed-source GPT-4, demonstrating both state-of-the-art performance and practical viability.

Technology Category

Application Category

📝 Abstract
Patient matching is the process of linking patients to appropriate clinical trials by accurately identifying and matching their medical records with trial eligibility criteria. We propose LLM-Match, a novel framework for patient matching leveraging fine-tuned open-source large language models. Our approach consists of four key components. First, a retrieval-augmented generation (RAG) module extracts relevant patient context from a vast pool of electronic health records (EHRs). Second, a prompt generation module constructs input prompts by integrating trial eligibility criteria (both inclusion and exclusion criteria), patient context, and system instructions. Third, a fine-tuning module with a classification head optimizes the model parameters using structured prompts and ground-truth labels. Fourth, an evaluation module assesses the fine-tuned model's performance on the testing datasets. We evaluated LLM-Match on four open datasets - n2c2, SIGIR, TREC 2021, and TREC 2022 - using open-source models, comparing it against TrialGPT, Zero-Shot, and GPT-4-based closed models. LLM-Match outperformed all baselines.
Problem

Research questions and friction points this paper is trying to address.

Patient matching for clinical trials using medical records
Leveraging large language models for accurate patient-trial matching
Improving trial eligibility criteria matching with retrieval-augmented generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses retrieval-augmented generation for EHR context extraction
Integrates trial criteria and patient data via prompt generation
Fine-tunes LLMs with structured prompts for classification
🔎 Similar Papers
No similar papers found.
X
Xiaodi Li
Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN, USA
Shaika Chowdhury
Shaika Chowdhury
Mayo Clinic
AI for HealthcareBiomedical InformaticsMachine/Deep LearningNLPPrecision Medicine
C
C. Wi
Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, MN, USA
M
Maria Vassilaki
Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
K
Ken Liu
Cardiology Department, Mayo Clinic Health System, La Crosse, WI, USA
T
Terence T Sio
Department of Radiation Oncology, Mayo Clinic, Rochester, MN, USA
O
Owen Garrick
Clinical Trials, Mayo Clinic, Rochester, MN, USA
Y
Y. Juhn
Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, MN, USA
J
James R. Cerhan
Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
Cui Tao
Cui Tao
Department of AI and Informatics, Mayo Clinic
Knowledge GraphInformation ExtractionOntologyML/DL based EHR data analysisVaccine
Nansu Zong
Nansu Zong
Department of Artificial Intelligence and Informatics Research, Mayo Clinic
Biomedical InformaticsBig Data and Linked DataMachine Learning and Data Mining