🤖 AI Summary
This work addresses the longstanding lack of reliable, uncontaminated benchmarks for evaluating clinical trial outcome prediction, which has hindered rigorous assessment of AI systems’ ability to forecast real-world future events. To this end, we introduce CT Open—an open, real-time evaluation platform that hosts four annual challenges and enforces strict timestamping to ensure predictions are submitted before outcomes become public. We develop the first fully automated decontamination pipeline, combining large language model–driven iterative web search with expert annotation to accurately determine the earliest public disclosure time of trial results. The project releases a training dataset alongside two temporally anchored test benchmarks—Winter 2025 and Summer 2025—establishing a fair, reproducible framework for evaluating AI-driven forecasting of real-world clinical events.
📝 Abstract
Scientists have long sought to accurately predict outcomes of real-world events before they happen. Can AI systems do so more reliably? We study this question through clinical trial outcome prediction, a high-stakes open challenge even for domain experts. We introduce CT Open, an open-access, live platform that will run four challenge every year. Anyone can submit predictions for each challenge. CT Open evaluates those submissions on trials whose outcomes were not yet public at the time of submission but were made public afterwards. Determining if a trial's outcome is public on the internet before a certain date is surprisingly difficult. Outcomes posted on official registries may lag behind by years, while the first mention may appear in obscure articles. To address this, we propose a novel, fully automated decontamination pipeline that uses iterative LLM-powered web search to identify the earliest mention of trial outcomes. We validate the pipeline's quality and accuracy by human expert's annotations. Since CT Open's pipeline ensures that every evaluated trial had no publicly reported outcome when the prediction was made, it allows participants to use any methodology and any data source. In this paper, we release a training set and two time-stamped test benchmarks, Winter 2025 and Summer 2025. We believe CT Open can serve as a central hub for advancing AI research on forecasting real-world outcomes before they occur, while also informing biomedical research and improving clinical trial design. CT Open Platform is hosted at $\href{https://ct-open.net/}{https://ct-open.net/}$