ROBOTO2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment

📅 2025-11-04

🏛️ Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address the low efficiency and poor reproducibility of manual risk-of-bias (RoB) assessment under the ROB2 framework, this study develops the first open-source, web-based interactive platform for assisted RoB2 evaluation. Methodologically, it integrates PDF intelligent parsing, ROB2-specific retrieval-augmented generation (RAG), large language model (LLM) reasoning, and a human-in-the-loop feedback mechanism for iterative refinement. A key contribution is the release of the first pediatric RoB2 annotation dataset—comprising 521 clinical trial reports with over 10,000 fine-grained annotations—enabling human-AI collaborative annotation and serving as a benchmark for LLM evaluation. The platform’s source code and full dataset are publicly released. Empirical evaluation across four mainstream LLMs demonstrates significant improvements in evidence localization accuracy and collaborative assessment efficiency, establishing a novel paradigm for automated systematic review workflows.

Technology Category

Application Category

📝 Abstract

We present ROBOTO2, an open-source, web-based platform for large language model (LLM)-assisted risk of bias (ROB) assessment of clinical trials. ROBOTO2 streamlines the traditionally labor-intensive ROB v2 (ROB2) annotation process via an interactive interface that combines PDF parsing, retrieval-augmented LLM prompting, and human-in-the-loop review. Users can upload clinical trial reports, receive preliminary answers and supporting evidence for ROB2 signaling questions, and provide real-time feedback or corrections to system suggestions. ROBOTO2 is publicly available at https://roboto2.vercel.app/, with code and data released to foster reproducibility and adoption. We construct and release a dataset of 521 pediatric clinical trial reports (8954 signaling questions with 1202 evidence passages), annotated using both manually and LLM-assisted methods, serving as a benchmark and enabling future research. Using this dataset, we benchmark ROB2 performance for 4 LLMs and provide an analysis into current model capabilities and ongoing challenges in automating this critical aspect of systematic review.

Problem

Research questions and friction points this paper is trying to address.

Automating clinical trial risk of bias assessment using LLM-assisted methods

Streamlining labor-intensive ROB2 annotation through interactive human-AI collaboration

Benchmarking LLM performance on pediatric clinical trial bias evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Web-based platform for LLM-assisted bias assessment

Combines PDF parsing, retrieval-augmented prompting, and human review

Provides preliminary answers with evidence for signaling questions

🔎 Similar Papers

LangBiTe: A Platform for Testing Bias in Large Language Models

2024-04-29arXiv.orgCitations: 2