CLaDMoP: Learning Transferrable Models from Successful Clinical Trials via LLMs

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing clinical trial outcome prediction models suffer from poor generalizability and high false-positive/negative rates, primarily due to overreliance on task- and phase-specific supervised signals. To address this, we propose the first pretraining framework tailored to Successful Clinical Trial (SCT) data—eliminating task-specific loss functions. Our approach leverages LLM-encoded inclusion/exclusion criteria and lightweight molecular branch embeddings, fused across multiple levels; representation learning is driven by grouped aggregation and a novel “pairwise matching” self-supervised proxy task. Downstream adaptation employs parameter-efficient fine-tuning (PEFT). On the TOP benchmark, our method achieves +10.5% PR-AUC and +3.6% ROC-AUC improvements over prior work. It significantly outperforms baselines in zero-shot and few-shot settings, attaining F1 scores comparable to the state-of-the-art supervised model MEXA-CTP. Key contributions include: (i) the first curated SCT dataset; (ii) a multi-level embedding fusion architecture; and (iii) an unsupervised, proxy-task-driven pretraining paradigm for clinical trial outcome prediction.

Technology Category

Application Category

📝 Abstract
Many existing models for clinical trial outcome prediction are optimized using task-specific loss functions on trial phase-specific data. While this scheme may boost prediction for common diseases and drugs, it can hinder learning of generalizable representations, leading to more false positives/negatives. To address this limitation, we introduce CLaDMoP, a new pre-training approach for clinical trial outcome prediction, alongside the Successful Clinical Trials dataset(SCT), specifically designed for this task. CLaDMoP leverages a Large Language Model-to encode trials' eligibility criteria-linked to a lightweight Drug-Molecule branch through a novel multi-level fusion technique. To efficiently fuse long embeddings across levels, we incorporate a grouping block, drastically reducing computational overhead. CLaDMoP avoids reliance on task-specific objectives by pre-training on a"pair matching"proxy task. Compared to established zero-shot and few-shot baselines, our method significantly improves both PR-AUC and ROC-AUC, especially for phase I and phase II trials. We further evaluate and perform ablation on CLaDMoP after Parameter-Efficient Fine-Tuning, comparing it to state-of-the-art supervised baselines, including MEXA-CTP, on the Trial Outcome Prediction(TOP) benchmark. CLaDMoP achieves up to 10.5% improvement in PR-AUC and 3.6% in ROC-AUC, while attaining comparable F1 score to MEXA-CTP, highlighting its potential for clinical trial outcome prediction. Code and SCT dataset can be downloaded from https://github.com/murai-lab/CLaDMoP.
Problem

Research questions and friction points this paper is trying to address.

Predicts clinical trial outcomes with transferable models
Reduces false positives and negatives in predictions
Improves accuracy for phase I and II trials
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based pre-training for clinical trials
Novel multi-level fusion technique
Grouping block reduces computational overhead
🔎 Similar Papers
No similar papers found.