ROBOTO2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment

📅 2025-11-04
🏛️ Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low efficiency and poor reproducibility of manual risk-of-bias (RoB) assessment under the ROB2 framework, this study develops the first open-source, web-based interactive platform for assisted RoB2 evaluation. Methodologically, it integrates PDF intelligent parsing, ROB2-specific retrieval-augmented generation (RAG), large language model (LLM) reasoning, and a human-in-the-loop feedback mechanism for iterative refinement. A key contribution is the release of the first pediatric RoB2 annotation dataset—comprising 521 clinical trial reports with over 10,000 fine-grained annotations—enabling human-AI collaborative annotation and serving as a benchmark for LLM evaluation. The platform’s source code and full dataset are publicly released. Empirical evaluation across four mainstream LLMs demonstrates significant improvements in evidence localization accuracy and collaborative assessment efficiency, establishing a novel paradigm for automated systematic review workflows.

Technology Category

Application Category

📝 Abstract
We present ROBOTO2, an open-source, web-based platform for large language model (LLM)-assisted risk of bias (ROB) assessment of clinical trials. ROBOTO2 streamlines the traditionally labor-intensive ROB v2 (ROB2) annotation process via an interactive interface that combines PDF parsing, retrieval-augmented LLM prompting, and human-in-the-loop review. Users can upload clinical trial reports, receive preliminary answers and supporting evidence for ROB2 signaling questions, and provide real-time feedback or corrections to system suggestions. ROBOTO2 is publicly available at https://roboto2.vercel.app/, with code and data released to foster reproducibility and adoption. We construct and release a dataset of 521 pediatric clinical trial reports (8954 signaling questions with 1202 evidence passages), annotated using both manually and LLM-assisted methods, serving as a benchmark and enabling future research. Using this dataset, we benchmark ROB2 performance for 4 LLMs and provide an analysis into current model capabilities and ongoing challenges in automating this critical aspect of systematic review.
Problem

Research questions and friction points this paper is trying to address.

Automating clinical trial risk of bias assessment using LLM-assisted methods
Streamlining labor-intensive ROB2 annotation through interactive human-AI collaboration
Benchmarking LLM performance on pediatric clinical trial bias evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Web-based platform for LLM-assisted bias assessment
Combines PDF parsing, retrieval-augmented prompting, and human review
Provides preliminary answers with evidence for signaling questions
🔎 Similar Papers
No similar papers found.
A
Anthony Hevia
University of Washington
S
S. Chintalapati
University of Washington
V
Veronica Ka Wai Lai
The Hospital for Sick Children
Thanh Tam Nguyen
Thanh Tam Nguyen
Lecturer, Griffith University
Social Network MiningStream ProcessingBig DataPrivacy-Preserving MLRecommender Systems
W
Wai-Tat Wong
The Chinese University of Hong Kong
T
Terry Klassen
University of Saskatchewan
Lucy Lu Wang
Lucy Lu Wang
University of Washington; Allen Institute for AI (Ai2)
health informaticsnatural language processingscience communicationopen access