Smart Trial: Evaluating the Use of Large Language Models for Recruiting Clinical Trial Participants via Social Media

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This study addresses the low efficiency and geographical constraints of traditional clinical trial recruitment by proposing, for the first time, a large language model (LLM)-based paradigm to identify and assess potential participants from social media. We introduce TRIALQA, the first annotated dataset designed for real-world trial eligibility determination, comprising colon and prostate cancer–related social media texts and supporting multi-hop reasoning and participant motivation identification. We systematically evaluate six training and inference strategies across seven state-of-the-art LLMs on eligibility criterion matching and willingness-to-participate classification, with rigorous multi-round human annotation ensuring high data quality. Results demonstrate that LLMs exhibit promising capability in comprehending complex medical criteria but remain limited in multi-step logical reasoning and fine-grained judgment. This work establishes a new benchmark, a novel publicly available dataset, and a reproducible methodology for AI-driven precision clinical trial recruitment.

Technology Category

Application Category

📝 Abstract

Clinical trials (CT) are essential for advancing medical research and treatment, yet efficiently recruiting eligible participants -- each of whom must meet complex eligibility criteria -- remains a significant challenge. Traditional recruitment approaches, such as advertisements or electronic health record screening within hospitals, are often time-consuming and geographically constrained. This work addresses the recruitment challenge by leveraging the vast amount of health-related information individuals share on social media platforms. With the emergence of powerful large language models (LLMs) capable of sophisticated text understanding, we pose the central research question: Can LLM-driven tools facilitate CT recruitment by identifying potential participants through their engagement on social media? To investigate this question, we introduce TRIALQA, a novel dataset comprising two social media collections from the subreddits on colon cancer and prostate cancer. Using eligibility criteria from public real-world CTs, experienced annotators are hired to annotate TRIALQA to indicate (1) whether a social media user meets a given eligibility criterion and (2) the user's stated reasons for interest in participating in CT. We benchmark seven widely used LLMs on these two prediction tasks, employing six distinct training and inference strategies. Our extensive experiments reveal that, while LLMs show considerable promise, they still face challenges in performing the complex, multi-hop reasoning needed to accurately assess eligibility criteria.

Problem

Research questions and friction points this paper is trying to address.

Using LLMs to recruit clinical trial participants via social media

Evaluating LLM performance on eligibility assessment from social media data

Addressing complex multi-hop reasoning for clinical trial criteria

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven social media screening for recruitment

TRIALQA dataset with annotated eligibility criteria

Multi-hop reasoning evaluation on seven LLMs

🔎 Similar Papers

No similar papers found.