Improving Clinical Trial Recruitment using Clinical Narratives and Large Language Models

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This study addresses the inefficiency in clinical trial patient screening, a major cause of recruitment failure. It systematically evaluates general-purpose and medical large language models—particularly encoder-decoder architectures such as MedGemma—on the task of screening long clinical narrative texts, revealing for the first time a critical “intermediate information loss” issue. To mitigate this limitation, the authors propose an integrated approach combining named entity recognition (NER), retrieval-augmented generation (RAG), and long-context modeling. Experimental results demonstrate that MedGemma augmented with RAG achieves a micro F1 score of 89.05% on the N2C2 2018 dataset and significantly outperforms existing baselines in eligibility determination tasks requiring cross-paragraph reasoning. This work establishes a novel paradigm for efficient, automated patient screening in clinical trials.

Technology Category

Application Category

📝 Abstract

Screening patients for enrollment is a well-known, labor-intensive bottleneck that leads to under-enrollment and, ultimately, trial failures. Recent breakthroughs in large language models (LLMs) offer a promising opportunity to use artificial intelligence to improve screening. This study systematically explored both encoder- and decoder-based generative LLMs for screening clinical narratives to facilitate clinical trial recruitment. We examined both general-purpose LLMs and medical-adapted LLMs and explored three strategies to alleviate the "Lost in the Middle" issue when handling long documents, including 1) Original long-context: using the default context windows of LLMs, 2) NER-based extractive summarization: converting the long document into summarizations using named entity recognition, 3) RAG: dynamic evidence retrieval based on eligibility criteria. The 2018 N2C2 Track 1 benchmark dataset is used for evaluation. Our experimental results show that the MedGemma model with the RAG strategy achieved the best micro-F1 score of 89.05%, outperforming other models. Generative LLMs have remarkably improved trial criteria that require long-term reasoning across long documents, whereas trial criteria that span a short piece of context (e.g., lab tests) show incremental improvements. The real-world adoption of LLMs for trial recruitment must consider specific criteria for selecting among rule-based queries, encoder-based LLMs, and generative LLMs to maximize efficiency within reasonable computing costs.

Problem

Research questions and friction points this paper is trying to address.

clinical trial recruitment

patient screening

eligibility criteria

clinical narratives

under-enrollment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Clinical Trial Recruitment

Retrieval-Augmented Generation