Can Agents Judge Systematic Reviews Like Humans? Evaluating SLRs with LLM-based Multi-Agent System

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address labor-intensive evaluation and inconsistent cross-disciplinary standards in systematic literature review (SLR) quality assessment, this paper proposes a large language model (LLM)-based multi-agent system (MAS) rigorously aligned with the PRISMA guidelines. The MAS comprises specialized agents for protocol validation, methodological appraisal, and thematic relevance checking, enabling structured, interpretable, and collaborative reasoning—unlike monolithic single-agent approaches. Integrated with academic database retrieval and NLP techniques, the system delivers end-to-end automated SLR quality scoring. Evaluated on SLR datasets spanning five disciplines, it achieves 84% agreement with domain expert annotations on PRISMA-based scores, demonstrating strong cross-disciplinary applicability and reliability.

Technology Category

Application Category

📝 Abstract
Systematic Literature Reviews (SLRs) are foundational to evidence-based research but remain labor-intensive and prone to inconsistency across disciplines. We present an LLM-based SLR evaluation copilot built on a Multi-Agent System (MAS) architecture to assist researchers in assessing the overall quality of the systematic literature reviews. The system automates protocol validation, methodological assessment, and topic relevance checks using a scholarly database. Unlike conventional single-agent methods, our design integrates a specialized agentic approach aligned with PRISMA guidelines to support more structured and interpretable evaluations. We conducted an initial study on five published SLRs from diverse domains, comparing system outputs to expert-annotated PRISMA scores, and observed 84% agreement. While early results are promising, this work represents a first step toward scalable and accurate NLP-driven systems for interdisciplinary workflows and reveals their capacity for rigorous, domain-agnostic knowledge aggregation to streamline the review process.
Problem

Research questions and friction points this paper is trying to address.

Automating systematic literature review quality assessment
Reducing labor-intensive manual evaluation inconsistencies
Validating review protocols using multi-agent systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based multi-agent system architecture
Automated protocol validation and assessment
PRISMA-aligned specialized agentic approach
🔎 Similar Papers
No similar papers found.
Abdullah Mushtaq
Abdullah Mushtaq
Research Assistant
Deep LearningMulti-Agent SystemsAgentic AI
M
Muhammad Rafay Naeem
Department of Computer Science, Information Technology University
Ibrahim Ghaznavi
Ibrahim Ghaznavi
Assistant Professor
Spatial ComputingImmersive TechnologiesHuman Computer InteractionICTD
A
Alaa Abd-alrazaq
AI Center for Precision Health, Weill Cornell Medicine–Qatar
A
Aliya Tabassum
Computer Science and Engineering Department, Qatar University
Junaid Qadir
Junaid Qadir
Professor of Computer Engineering, Qatar University
Human-centered AIAI EthicsEngineering EducationAI in EducationHealthcare AI