AUTOCT: Automating Interpretable Clinical Trial Prediction with LLM Agents

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep learning models for clinical trial outcome prediction suffer from poor interpretability, susceptibility to label leakage, and limited deployability in high-stakes healthcare settings. Method: This paper introduces the first LLM-driven self-iterative feature engineering framework, integrating large language models’ reasoning capabilities with Monte Carlo Tree Search (MCTS) to autonomously generate, evaluate, and optimize features from structured data—without human intervention. Contribution/Results: The framework inherently prevents label leakage, substantially improving model robustness and clinical trustworthiness while simultaneously enhancing predictive performance and decision interpretability. It achieves or surpasses state-of-the-art results across multiple clinical trial prediction tasks, delivering high-accuracy, interpretable, and production-ready models within a limited number of iterations.

Technology Category

Application Category

📝 Abstract
Clinical trials are critical for advancing medical treatments but remain prohibitively expensive and time-consuming. Accurate prediction of clinical trial outcomes can significantly reduce research and development costs and accelerate drug discovery. While recent deep learning models have shown promise by leveraging unstructured data, their black-box nature, lack of interpretability, and vulnerability to label leakage limit their practical use in high-stakes biomedical contexts. In this work, we propose AutoCT, a novel framework that combines the reasoning capabilities of large language models with the explainability of classical machine learning. AutoCT autonomously generates, evaluates, and refines tabular features based on public information without human input. Our method uses Monte Carlo Tree Search to iteratively optimize predictive performance. Experimental results show that AutoCT performs on par with or better than SOTA methods on clinical trial prediction tasks within only a limited number of self-refinement iterations, establishing a new paradigm for scalable, interpretable, and cost-efficient clinical trial prediction.
Problem

Research questions and friction points this paper is trying to address.

Automating clinical trial outcome prediction using LLMs
Improving interpretability in clinical trial prediction models
Reducing costs and time in drug discovery process
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLM reasoning with explainable ML
Autonomously generates and refines tabular features
Uses Monte Carlo Tree Search for optimization
🔎 Similar Papers
No similar papers found.