🤖 AI Summary
In systematic reviews, timely integration of emerging evidence is hindered by heterogeneous preprint quality, complicating inclusion decisions. To address this, we propose AutoConfidence—a novel multimodal framework that jointly leverages semantic embeddings (BERT) and large language model (LLM)-derived credibility scores to predict preprint trustworthiness. Crucially, it integrates a survival cure model to simultaneously forecast both publication likelihood and time-varying publication risk. The framework operates end-to-end: parsing raw preprint text, generating semantic representations, assessing content credibility, and modeling publication dynamics. Empirical evaluation shows that its random forest classifier achieves an AUROC of 0.747, while the survival cure model attains an AUROC of 0.731 and a C-index of 0.667—both significantly outperforming established baselines. AutoConfidence substantially reduces reliance on manual screening, thereby enhancing the timeliness and scientific rigor of evidence updating in systematic reviews.
📝 Abstract
Background. Systematic reviews in comparative effectiveness research require timely evidence synthesis. Preprints accelerate knowledge dissemination but vary in quality, posing challenges for systematic reviews. Methods. We propose AutoConfidence (automated confidence assessment), an advanced framework for predicting preprint publication, which reduces reliance on manual curation and expands the range of predictors, including three key advancements: (1) automated data extraction using natural language processing techniques, (2) semantic embeddings of titles and abstracts, and (3) large language model (LLM)-driven evaluation scores. Additionally, we employed two prediction models: a random forest classifier for binary outcome and a survival cure model that predicts both binary outcome and publication risk over time. Results. The random forest classifier achieved AUROC 0.692 with LLM-driven scores, improving to 0.733 with semantic embeddings and 0.747 with article usage metrics. The survival cure model reached AUROC 0.716 with LLM-driven scores, improving to 0.731 with semantic embeddings. For publication risk prediction, it achieved a concordance index of 0.658, increasing to 0.667 with semantic embeddings. Conclusion. Our study advances the framework for preprint publication prediction through automated data extraction and multiple feature integration. By combining semantic embeddings with LLM-driven evaluations, AutoConfidence enhances predictive performance while reducing manual annotation burden. The framework has the potential to facilitate systematic incorporation of preprint articles in evidence-based medicine, supporting researchers in more effective evaluation and utilization of preprint resources.