TrialPanorama: Database and Benchmark for Systematic Review and Design of Clinical Trials

๐Ÿ“… 2025-05-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Clinical trial AI development suffers from a lack of large-scale, structured, and ontology-aligned data. Method: We constructed a structured database covering 1.65 million trials, integrating 15 global sources, and achieved the first full-lifecycle ontology alignment of clinical trialsโ€”mapped to UMLS, DrugBank, and MedDRA. We further introduced CT-Bench, the first standardized benchmark for evidence-based medicine, comprising eight high-clinical-relevance tasks. Our methodology integrates heterogeneous multi-source data, fine-grained ontology mapping, and an LLM-based zero-shot evaluation framework. Contribution/Results: We publicly release both the database and CT-Bench. Extensive experiments on five state-of-the-art LLMs reveal significant performance gaps on critical tasks, underscoring the necessity of domain-specific modeling and filling a fundamental gap in AI evaluation for clinical trials.

Technology Category

Application Category

๐Ÿ“ Abstract
Developing artificial intelligence (AI) for vertical domains requires a solid data foundation for both training and evaluation. In this work, we introduce TrialPanorama, a large-scale, structured database comprising 1,657,476 clinical trial records aggregated from 15 global sources. The database captures key aspects of trial design and execution, including trial setups, interventions, conditions, biomarkers, and outcomes, and links them to standard biomedical ontologies such as DrugBank and MedDRA. This structured and ontology-grounded design enables TrialPanorama to serve as a unified, extensible resource for a wide range of clinical trial tasks, including trial planning, design, and summarization. To demonstrate its utility, we derive a suite of benchmark tasks directly from the TrialPanorama database. The benchmark spans eight tasks across two categories: three for systematic review (study search, study screening, and evidence summarization) and five for trial design (arm design, eligibility criteria, endpoint selection, sample size estimation, and trial completion assessment). The experiments using five state-of-the-art large language models (LLMs) show that while general-purpose LLMs exhibit some zero-shot capability, their performance is still inadequate for high-stakes clinical trial workflows. We release TrialPanorama database and the benchmark to facilitate further research on AI for clinical trials.
Problem

Research questions and friction points this paper is trying to address.

Lack of structured database for clinical trial AI training and evaluation
Need for unified benchmarks in clinical trial design and review tasks
Inadequate performance of general-purpose LLMs in clinical trial workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale structured clinical trial database
Ontology-grounded design for trial tasks
Benchmark tasks for AI evaluation
๐Ÿ”Ž Similar Papers
No similar papers found.