AISysRev - LLM-based Tool for Title-abstract Screening

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Manual title/abstract screening in systematic literature reviews (SLRs) within software engineering is time-consuming and inefficient. Method: This study proposes a lightweight LLM-based assistant tool for preliminary screening. It introduces a fine-grained four-class taxonomy—“clear inclusion,” “clear exclusion,” “borderline inclusion,” and “borderline exclusion”—to expose LLM limitations in ambiguous cases and underscore the necessity of human-AI collaboration. The approach employs zero-shot and few-shot prompting, integrates multiple open LLMs via OpenRouter, and deploys the system as a Dockerized web application. Contribution/Results: Empirical evaluation demonstrates that the tool reduces manual screening effort by approximately 60% on average, significantly improving both efficiency and inter-rater consistency during initial screening. It is particularly effective for rapid pre-screening of large-scale SLR candidate corpora, offering a practical, scalable solution that balances automation with expert oversight.

Technology Category

Application Category

📝 Abstract
Systematic reviews are a standard practice for summarizing the state of evidence in software engineering. Conducting systematic reviews is laborious, especially during the screening or study selection phase, where the number of papers can be overwhelming. During this phase, papers are assessed against inclusion and exclusion criteria based on their titles and abstracts. Recent research has demonstrated that large language models (LLMs) can perform title-abstract screening at a level comparable to that of a master's student. While LLMs cannot be fully trusted, they can help, for example, in Rapid Reviews, which try to expedite the review process. Building on recent research, we developed AiSysRev, an LLM-based screening tool implemented as a web application running in a Docker container. The tool accepts a CSV file containing paper titles and abstracts. Users specify inclusion and exclusion criteria. One can use multiple LLMs for screening via OpenRouter. AiSysRev supports both zero-shot and few-shot screening, and also allows for manual screening through interfaces that display LLM results as guidance for human reviewers.We conducted a trial study with 137 papers using the tool. Our findings indicate that papers can be classified into four categories: Easy Includes, Easy Excludes, Boundary Includes, and Boundary Excludes. The Boundary cases, where LLMs are prone to errors, highlight the need for human intervention. While LLMs do not replace human judgment in systematic reviews, they can significantly reduce the burden of assessing large volumes of scientific literature. Video: https://www.youtube.com/watch?v=jVbEj4Y4tQI Tool: https://github.com/EvoTestOps/AISysRev
Problem

Research questions and friction points this paper is trying to address.

Automating systematic review screening using large language models
Reducing manual effort in title-abstract paper assessment
Addressing boundary cases requiring human intervention in screening
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based web tool for title-abstract screening
Supports zero-shot and few-shot screening via OpenRouter
Docker-containerized application with human review interface