LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature Reviews

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Current systematic literature review (SLR) abstract screening methods lack fine-grained, zero-shot relevance ranking capabilities. Method: We propose a tuning-free two-stage framework: (1) explicitly modeling the SLR research question and inclusion/exclusion criteria as structured prompts for large language models (LLMs) to enable fine-grained, interpretable relevance scoring; and (2) integrating dense re-ranking via a contrastive-learning-finetuned bi-encoder to mitigate error propagation inherent in question-answering–based cascaded ranking. Contribution/Results: Evaluated on a benchmark of 57 medical-domain SLRs, our approach achieves an average mean average precision (MAP) improvement of 5–10 percentage points over state-of-the-art QA-based ranking methods. The implementation code and annotated dataset are publicly released.

Technology Category

Application Category

📝 Abstract

The scientific literature is growing rapidly, making it hard to keep track of the state-of-the-art. Systematic literature reviews (SLRs) aim to identify and evaluate all relevant papers on a topic. After retrieving a set of candidate papers, the abstract screening phase determines initial relevance. To date, abstract screening methods using large language models (LLMs) focus on binary classification settings; existing question answering (QA) based ranking approaches suffer from error propagation. LLMs offer a unique opportunity to evaluate the SLR's inclusion and exclusion criteria, yet, existing benchmarks do not provide them exhaustively. We manually extract these criteria as well as research questions for 57 SLRs, mostly in the medical domain, enabling principled comparisons between approaches. Moreover, we propose LGAR, a zero-shot LLM Guided Abstract Ranker composed of an LLM based graded relevance scorer and a dense re-ranker. Our extensive experiments show that LGAR outperforms existing QA-based methods by 5-10 pp. in mean average precision. Our code and data is publicly available.

Problem

Research questions and friction points this paper is trying to address.

Addresses rapid scientific literature growth challenges

Improves abstract screening in systematic reviews

Overcomes error propagation in QA-based ranking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot LLM-guided abstract ranking

Dense re-ranker for improved precision

Manually extracted SLR criteria for evaluation

🔎 Similar Papers

LitLLM: A Toolkit for Scientific Literature Review