TrialMatchAI: An End-to-End AI-powered Clinical Trial Recommendation System to Streamline Patient-to-Trial Matching

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical trial patient recruitment is persistently hindered by low-efficiency manual screening and challenges in integrating heterogeneous data—particularly structured electronic health records and unstructured clinician notes. To address this, we propose an end-to-end, interpretable AI system that introduces medical Chain-of-Thought–guided eligibility reasoning—novel in clinical trial matching—for transparent, stepwise qualification assessment. Our method integrates biomedical entity standardization, hybrid lexical-semantic retrieval, and a RAG-enhanced fine-tuned open-source large language model, while supporting Phenopackets interoperability and on-premises privacy-preserving deployment. A modular architecture ensures model replaceability and decision transparency. Evaluated in real-world oncology settings, the system achieves 92% recall within the top-20 trial recommendations per patient; domain experts validated >90% accuracy in criterion-level matching, with exceptional performance in biomarker-based eligibility assessment.

Technology Category

Application Category

📝 Abstract
Patient recruitment remains a major bottleneck in clinical trials, calling for scalable and automated solutions. We present TrialMatchAI, an AI-powered recommendation system that automates patient-to-trial matching by processing heterogeneous clinical data, including structured records and unstructured physician notes. Built on fine-tuned, open-source large language models (LLMs) within a retrieval-augmented generation framework, TrialMatchAI ensures transparency and reproducibility and maintains a lightweight deployment footprint suitable for clinical environments. The system normalizes biomedical entities, retrieves relevant trials using a hybrid search strategy combining lexical and semantic similarity, re-ranks results, and performs criterion-level eligibility assessments using medical Chain-of-Thought reasoning. This pipeline delivers explainable outputs with traceable decision rationales. In real-world validation, 92 percent of oncology patients had at least one relevant trial retrieved within the top 20 recommendations. Evaluation across synthetic and real clinical datasets confirmed state-of-the-art performance, with expert assessment validating over 90 percent accuracy in criterion-level eligibility classification, particularly excelling in biomarker-driven matches. Designed for modularity and privacy, TrialMatchAI supports Phenopackets-standardized data, enables secure local deployment, and allows seamless replacement of LLM components as more advanced models emerge. By enhancing efficiency and interpretability and offering lightweight, open-source deployment, TrialMatchAI provides a scalable solution for AI-driven clinical trial matching in precision medicine.
Problem

Research questions and friction points this paper is trying to address.

Automates patient-to-trial matching using AI and clinical data
Ensures transparency and reproducibility with lightweight deployment
Improves efficiency and accuracy in clinical trial recruitment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses fine-tuned LLMs for clinical data processing
Hybrid search strategy combining lexical and semantic similarity
Modular design with privacy-focused local deployment
🔎 Similar Papers
No similar papers found.
M
Majd Abdallah
University of Bordeaux, CNRS, IBGC UMR 5095, 146 Rue Léo Saignat, 33000 Bordeaux, France; University of Bordeaux, Bordeaux Bioinformatics Center, 146 Rue Léo Saignat, 33000 Bordeaux, France
S
Sigve Nakken
Department of Tumor Biology, Institute of Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway; Centre for Cancer Cell Reprogramming, Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, 0379 Oslo, Norway; Department of Informatics, University of Oslo, 0316 Oslo, Norway
M
Mariska Bierkens
Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
J
Johanna Galvis
University of Bordeaux, CNRS, IBGC UMR 5095, 146 Rue Léo Saignat, 33000 Bordeaux, France; University of Bordeaux, Bordeaux Bioinformatics Center, 146 Rue Léo Saignat, 33000 Bordeaux, France
A
Alexis Groppi
University of Bordeaux, CNRS, IBGC UMR 5095, 146 Rue Léo Saignat, 33000 Bordeaux, France; University of Bordeaux, Bordeaux Bioinformatics Center, 146 Rue Léo Saignat, 33000 Bordeaux, France
S
Slim Karkar
University of Bordeaux, CNRS, IBGC UMR 5095, 146 Rue Léo Saignat, 33000 Bordeaux, France; University of Bordeaux, Bordeaux Bioinformatics Center, 146 Rue Léo Saignat, 33000 Bordeaux, France
L
Lana Meiqari
Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
M
Maria Alexandra Rujano
European Clinical Research Infrastructure Network (ECRIN), Boulevard Saint Jacques 30, 75014, Paris, France
S
Steve Canham
European Clinical Research Infrastructure Network (ECRIN), Boulevard Saint Jacques 30, 75014, Paris, France
R
Rodrigo Dienstmann
Oncology Data Science (ODysSey) Group, Vall d’Hebron Institute of Oncology (VHIO), Barcelona, Spain; University of Vic - Central University of Catalonia, Barcelona, Spain
R
Remond Fijneman
Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
Eivind Hovig
Eivind Hovig
Professor.1. Centre for bioinformatics. Dept. of Informatics, Univ of Oslo. 2. Oslo Univ. Hosp.
Cancer genomicsDNA variationbioinformatics
G
Gerrit Meijer
Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
M
Macha Nikolski
University of Bordeaux, CNRS, IBGC UMR 5095, 146 Rue Léo Saignat, 33000 Bordeaux, France; University of Bordeaux, Bordeaux Bioinformatics Center, 146 Rue Léo Saignat, 33000 Bordeaux, France