🤖 AI Summary
Frequent revisions to clinical trial eligibility criteria often lead to delays and increased costs. To address this challenge, this work introduces “eligibility criterion revision prediction” as a novel natural language processing task and presents AMEND++, the first structured benchmark for this purpose. AMEND++ comprises the real-world clinical trial version history dataset AMEND and its high-quality subset AMEND_LLM, refined via large language model–based denoising. We propose Change-Aware Masked Language Modeling (CAMLM), a pretraining strategy that effectively incorporates historical editing signals. Experimental results demonstrate that CAMLM substantially enhances the revision prediction performance of multiple baseline models, offering a new approach toward more efficient and robust clinical trial design.
📝 Abstract
Clinical trial amendments frequently introduce delays, increased costs, and administrative burden, with eligibility criteria being the most commonly amended component. We introduce \textit{eligibility criteria amendment prediction}, a novel NLP task that aims to forecast whether the eligibility criteria of an initial trial protocol will undergo future amendments. To support this task, we release $\texttt{AMEND++}$, a benchmark suite comprising two datasets: $\texttt{AMEND}$, which captures eligibility-criteria version histories and amendment labels from public clinical trials, and $\verb|AMEND_LLM|$, a refined subset curated using an LLM-based denoising pipeline to isolate substantive changes. We further propose $\textit{Change-Aware Masked Language Modeling}$ (CAMLM), a revision-aware pretraining strategy that leverages historical edits to learn amendment-sensitive representations. Experiments across diverse baselines show that CAMLM consistently improves amendment prediction, enabling more robust and cost-effective clinical trial design.