🤖 AI Summary
Systematic literature reviews (SLRs) are resource-intensive and time-consuming, often impeding timely evidence-based decision-making. To address this challenge, this study proposes AgentSLR—the first end-to-end automated AI agent pipeline that integrates literature search, screening, data extraction, and report generation, with feasibility demonstrated in the domain of epidemiology. Built upon state-of-the-art large language models, AgentSLR employs an open-source agent architecture refined through human feedback, emphasizing model capability over sheer parameter count. Evaluated on systematic reviews for nine WHO-priority pathogens, AgentSLR achieves accuracy comparable to human experts while reducing the average review duration from seven weeks to just 20 hours—an efficiency gain of 58-fold.
📝 Abstract
Systematic literature reviews are essential for synthesizing scientific evidence but are costly, difficult to scale and time-intensive, creating bottlenecks for evidence-based policy. We study whether large language models can automate the complete systematic review workflow, from article retrieval, article screening, data extraction to report synthesis. Applied to epidemiological reviews of nine WHO-designated priority pathogens and validated against expert-curated ground truth, our open-source agentic pipeline (AgentSLR) achieves performance comparable to human researchers while reducing review time from approximately 7 weeks to 20 hours (a 58x speed-up). Our comparison of five frontier models reveals that performance on SLR is driven less by model size or inference cost than by each model's distinctive capabilities. Through human-in-the-loop validation, we identify key failure modes. Our results demonstrate that agentic AI can substantially accelerate scientific evidence synthesis in specialised domains.