🤖 AI Summary
To address the low efficiency and poor reproducibility of manual risk-of-bias (RoB) assessment under the ROB2 framework, this study develops the first open-source, web-based interactive platform for assisted RoB2 evaluation. Methodologically, it integrates PDF intelligent parsing, ROB2-specific retrieval-augmented generation (RAG), large language model (LLM) reasoning, and a human-in-the-loop feedback mechanism for iterative refinement. A key contribution is the release of the first pediatric RoB2 annotation dataset—comprising 521 clinical trial reports with over 10,000 fine-grained annotations—enabling human-AI collaborative annotation and serving as a benchmark for LLM evaluation. The platform’s source code and full dataset are publicly released. Empirical evaluation across four mainstream LLMs demonstrates significant improvements in evidence localization accuracy and collaborative assessment efficiency, establishing a novel paradigm for automated systematic review workflows.
📝 Abstract
We present ROBOTO2, an open-source, web-based platform for large language model (LLM)-assisted risk of bias (ROB) assessment of clinical trials. ROBOTO2 streamlines the traditionally labor-intensive ROB v2 (ROB2) annotation process via an interactive interface that combines PDF parsing, retrieval-augmented LLM prompting, and human-in-the-loop review. Users can upload clinical trial reports, receive preliminary answers and supporting evidence for ROB2 signaling questions, and provide real-time feedback or corrections to system suggestions. ROBOTO2 is publicly available at https://roboto2.vercel.app/, with code and data released to foster reproducibility and adoption. We construct and release a dataset of 521 pediatric clinical trial reports (8954 signaling questions with 1202 evidence passages), annotated using both manually and LLM-assisted methods, serving as a benchmark and enabling future research. Using this dataset, we benchmark ROB2 performance for 4 LLMs and provide an analysis into current model capabilities and ongoing challenges in automating this critical aspect of systematic review.