🤖 AI Summary
Existing clinical trial outcome prediction models suffer from poor generalizability and high false-positive/negative rates, primarily due to overreliance on task- and phase-specific supervised signals. To address this, we propose the first pretraining framework tailored to Successful Clinical Trial (SCT) data—eliminating task-specific loss functions. Our approach leverages LLM-encoded inclusion/exclusion criteria and lightweight molecular branch embeddings, fused across multiple levels; representation learning is driven by grouped aggregation and a novel “pairwise matching” self-supervised proxy task. Downstream adaptation employs parameter-efficient fine-tuning (PEFT). On the TOP benchmark, our method achieves +10.5% PR-AUC and +3.6% ROC-AUC improvements over prior work. It significantly outperforms baselines in zero-shot and few-shot settings, attaining F1 scores comparable to the state-of-the-art supervised model MEXA-CTP. Key contributions include: (i) the first curated SCT dataset; (ii) a multi-level embedding fusion architecture; and (iii) an unsupervised, proxy-task-driven pretraining paradigm for clinical trial outcome prediction.
📝 Abstract
Many existing models for clinical trial outcome prediction are optimized using task-specific loss functions on trial phase-specific data. While this scheme may boost prediction for common diseases and drugs, it can hinder learning of generalizable representations, leading to more false positives/negatives. To address this limitation, we introduce CLaDMoP, a new pre-training approach for clinical trial outcome prediction, alongside the Successful Clinical Trials dataset(SCT), specifically designed for this task. CLaDMoP leverages a Large Language Model-to encode trials' eligibility criteria-linked to a lightweight Drug-Molecule branch through a novel multi-level fusion technique. To efficiently fuse long embeddings across levels, we incorporate a grouping block, drastically reducing computational overhead. CLaDMoP avoids reliance on task-specific objectives by pre-training on a"pair matching"proxy task. Compared to established zero-shot and few-shot baselines, our method significantly improves both PR-AUC and ROC-AUC, especially for phase I and phase II trials. We further evaluate and perform ablation on CLaDMoP after Parameter-Efficient Fine-Tuning, comparing it to state-of-the-art supervised baselines, including MEXA-CTP, on the Trial Outcome Prediction(TOP) benchmark. CLaDMoP achieves up to 10.5% improvement in PR-AUC and 3.6% in ROC-AUC, while attaining comparable F1 score to MEXA-CTP, highlighting its potential for clinical trial outcome prediction. Code and SCT dataset can be downloaded from https://github.com/murai-lab/CLaDMoP.