A Large-Scale, Cross-Disciplinary Corpus of Systematic Reviews

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing benchmarks for systematic reviews are limited in scale and disciplinary coverage, hindering robust cross-domain method evaluation and metascientific research. This work introduces Webis-SR4ALL-26, the first large-scale corpus of systematic reviews spanning all scientific fields, comprising 318,710 reviews. Through a multi-stage preprocessing pipeline, the corpus is enriched with linked OpenAlex metadata, reference lists, and structured methodological artifacts—such as search strategies and inclusion/exclusion criteria. Notably, the project pioneers the structured extraction and standardization of executable search strategies. The authors release both the open corpus and associated processing code. Large-scale baseline retrieval experiments conducted on OpenAlex demonstrate the resource’s effectiveness for cross-disciplinary literature retrieval, screening, and metascientific analysis.

Technology Category

Application Category

📝 Abstract

Existing benchmarks for systematic reviewing remain limited either in scale or in disciplinary coverage, with some collections comprising only a modest number of topics and others focusing primarily on biomedical research. We present Webis-SR4ALL-26, a large-scale, cross-disciplinary corpus of 301,871 systematic reviews spanning all scientific fields as covered by OpenAlex. Using a multi-stage pre-processing pipeline, we link reviews to resolved OpenAlex metadata and reference lists and extract, when explicitly reported, structured method artifacts relevant to retrieval and screening. These artifacts include reported search strategies (Boolean queries or keyword lists) that we normalize into executable approximations, as well as reported inclusion and exclusion criteria. Together, these layers support cross-domain benchmarking of retrieval and screening components against review reference lists, training and evaluation of extraction methods for review artifacts, and comparative meta-science analyses of systematic review practices across disciplines and time. To demonstrate one concrete use case, we report large-scale baseline retrieval signals by executing normalized search strategies in OpenAlex and comparing retrieved sets to resolved reference lists. We release the corpus and the pre-processing pipeline, along with code used for extraction validation and the retrieval demonstration.

Problem

Research questions and friction points this paper is trying to address.

systematic reviews

cross-disciplinary

benchmarking

retrieval

screening

Innovation

Methods, ideas, or system contributions that make the work stand out.

systematic review

cross-disciplinary corpus

search strategy normalization