LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Accurately matching patients to clinical trials remains challenging due to semantic heterogeneity and structural complexity in electronic health records (EHRs) and eligibility criteria. Method: We propose a large language model (LLM)-driven retrieval-augmented generation (RAG) framework that jointly models multi-source EHR semantics and structured trial inclusion/exclusion criteria. Our approach integrates fine-tuned open-weight LLMs, structured prompt engineering, and optimized classification heads to deliver end-to-end, interpretable, and generalizable matching. Contribution/Results: This work introduces the first open-LLM–powered RAG paradigm for clinical trial matching—uniquely balancing logical traceability with cross-dataset generalization. Evaluated on four established benchmarks (n2c2, SIGIR, TREC 2021, TREC 2022), our method significantly outperforms TrialGPT, zero-shot baselines, and the closed-source GPT-4, demonstrating both state-of-the-art performance and practical viability.

Technology Category

Application Category

📝 Abstract

Patient matching is the process of linking patients to appropriate clinical trials by accurately identifying and matching their medical records with trial eligibility criteria. We propose LLM-Match, a novel framework for patient matching leveraging fine-tuned open-source large language models. Our approach consists of four key components. First, a retrieval-augmented generation (RAG) module extracts relevant patient context from a vast pool of electronic health records (EHRs). Second, a prompt generation module constructs input prompts by integrating trial eligibility criteria (both inclusion and exclusion criteria), patient context, and system instructions. Third, a fine-tuning module with a classification head optimizes the model parameters using structured prompts and ground-truth labels. Fourth, an evaluation module assesses the fine-tuned model's performance on the testing datasets. We evaluated LLM-Match on four open datasets - n2c2, SIGIR, TREC 2021, and TREC 2022 - using open-source models, comparing it against TrialGPT, Zero-Shot, and GPT-4-based closed models. LLM-Match outperformed all baselines.

Problem

Research questions and friction points this paper is trying to address.

Patient matching for clinical trials using medical records

Leveraging large language models for accurate patient-trial matching

Improving trial eligibility criteria matching with retrieval-augmented generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses retrieval-augmented generation for EHR context extraction

Integrates trial criteria and patient data via prompt generation

Fine-tunes LLMs with structured prompts for classification

🔎 Similar Papers

No similar papers found.