FRONTIER-RevRec: A Large-scale Dataset for Reviewer Recommendation

📅 2025-10-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current reviewer recommendation research is hindered by the absence of large-scale, multi-disciplinary, and reproducible benchmark datasets. To address this, we introduce FRONTIER-RevRec—the largest publicly available reviewer recommendation benchmark to date—comprising 209 interdisciplinary journals, 478,000 papers, and 178,000 reviewers. Through systematic analysis, we reveal fundamental structural differences between academic and commercial recommendation: content-based methods substantially outperform collaborative filtering, and language models more effectively capture semantic alignment between papers and reviewers. Building on these insights, we propose a novel method that jointly encodes paper text and reviewer historical review records via semantically enriched representations and an optimized aggregation strategy. Our approach achieves significant improvements across multiple evaluation metrics. FRONTIER-RevRec establishes a standardized evaluation framework for automated peer review and sets a new state-of-the-art baseline for future research.

Technology Category

Application Category

📝 Abstract
Reviewer recommendation is a critical task for enhancing the efficiency of academic publishing workflows. However, research in this area has been persistently hindered by the lack of high-quality benchmark datasets, which are often limited in scale, disciplinary scope, and comparative analyses of different methodologies. To address this gap, we introduce FRONTIER-RevRec, a large-scale dataset constructed from authentic peer review records (2007-2025) from the Frontiers open-access publishing platform https://www.frontiersin.org/. The dataset contains 177941 distinct reviewers and 478379 papers across 209 journals spanning multiple disciplines including clinical medicine, biology, psychology, engineering, and social sciences. Our comprehensive evaluation on this dataset reveals that content-based methods significantly outperform collaborative filtering. This finding is explained by our structural analysis, which uncovers fundamental differences between academic recommendation and commercial domains. Notably, approaches leveraging language models are particularly effective at capturing the semantic alignment between a paper's content and a reviewer's expertise. Furthermore, our experiments identify optimal aggregation strategies to enhance the recommendation pipeline. FRONTIER-RevRec is intended to serve as a comprehensive benchmark to advance research in reviewer recommendation and facilitate the development of more effective academic peer review systems. The FRONTIER-RevRec dataset is available at: https://anonymous.4open.science/r/FRONTIER-RevRec-5D05.
Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of high-quality benchmark datasets for reviewer recommendation
Evaluating content-based methods versus collaborative filtering for reviewer matching
Identifying optimal aggregation strategies to improve academic recommendation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructed large-scale dataset from authentic peer review records
Content-based methods outperform collaborative filtering approaches
Language models capture semantic alignment for reviewer expertise
Qiyao Peng
Qiyao Peng
Tianjin University
Nature Language ProcessingRecommender Systems
C
Chen Wang
Tianjin University, Tianjin, China
Y
Yinghui Wang
National Key Laboratory of Information Systems Engineering, Beijing, China
Hongtao Liu
Hongtao Liu
Du Xiaoman Financial
LLMRecommender System
X
Xuan Guo
Tianjin University, Tianjin, China
Wenjun Wang
Wenjun Wang
Tianjin University
Data MiningSocial NetworkComplex NetworkSmart City