Smart Trial: Evaluating the Use of Large Language Models for Recruiting Clinical Trial Participants via Social Media

๐Ÿ“… 2025-09-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the low efficiency and geographical constraints of traditional clinical trial recruitment by proposing, for the first time, a large language model (LLM)-based paradigm to identify and assess potential participants from social media. We introduce TRIALQA, the first annotated dataset designed for real-world trial eligibility determination, comprising colon and prostate cancerโ€“related social media texts and supporting multi-hop reasoning and participant motivation identification. We systematically evaluate six training and inference strategies across seven state-of-the-art LLMs on eligibility criterion matching and willingness-to-participate classification, with rigorous multi-round human annotation ensuring high data quality. Results demonstrate that LLMs exhibit promising capability in comprehending complex medical criteria but remain limited in multi-step logical reasoning and fine-grained judgment. This work establishes a new benchmark, a novel publicly available dataset, and a reproducible methodology for AI-driven precision clinical trial recruitment.

Technology Category

Application Category

๐Ÿ“ Abstract
Clinical trials (CT) are essential for advancing medical research and treatment, yet efficiently recruiting eligible participants -- each of whom must meet complex eligibility criteria -- remains a significant challenge. Traditional recruitment approaches, such as advertisements or electronic health record screening within hospitals, are often time-consuming and geographically constrained. This work addresses the recruitment challenge by leveraging the vast amount of health-related information individuals share on social media platforms. With the emergence of powerful large language models (LLMs) capable of sophisticated text understanding, we pose the central research question: Can LLM-driven tools facilitate CT recruitment by identifying potential participants through their engagement on social media? To investigate this question, we introduce TRIALQA, a novel dataset comprising two social media collections from the subreddits on colon cancer and prostate cancer. Using eligibility criteria from public real-world CTs, experienced annotators are hired to annotate TRIALQA to indicate (1) whether a social media user meets a given eligibility criterion and (2) the user's stated reasons for interest in participating in CT. We benchmark seven widely used LLMs on these two prediction tasks, employing six distinct training and inference strategies. Our extensive experiments reveal that, while LLMs show considerable promise, they still face challenges in performing the complex, multi-hop reasoning needed to accurately assess eligibility criteria.
Problem

Research questions and friction points this paper is trying to address.

Using LLMs to recruit clinical trial participants via social media
Evaluating LLM performance on eligibility assessment from social media data
Addressing complex multi-hop reasoning for clinical trial criteria
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven social media screening for recruitment
TRIALQA dataset with annotated eligibility criteria
Multi-hop reasoning evaluation on seven LLMs
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xiaofan Zhou
University of Illinois at Chicago, Chicago, Illinois, USA
Z
Zisu Wang
Colorado State University, Fort Collins, Colorado, USA
J
Janice Krieger
Mayo Clinic, Jacksonville, Florida, USA
Mohan Zalake
Mohan Zalake
University of Illinois
Human-Computer InteractionVirtual AgentsDigital HealthVirtual RealityAdaptive Technologies
Lu Cheng
Lu Cheng
Assistant Professor, UIC CS
Socially Responsible AICausal Machine LearningData MiningAI for Good