ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address inconsistencies, subjectivity, and poor scalability in peer review, this paper proposes a modular AI-augmented reviewing framework designed for seamless integration into real-world conference workflows. Built upon the GPT-OSS-120B large language model, the framework incorporates structured evaluation criteria, simulated reviewer role modeling, and LLM-driven review quality discrimination—systematically probing AI capabilities in critical tasks such as factual verification and literature coverage. It introduces, for the first time, principled human-AI collaborative reviewing guidelines. Evaluated on the ICLR 2025 dataset, the framework achieves 81.8% accuracy in acceptance decisions—matching human reviewers’ average performance—while generating review comments of overall higher quality than human baselines. This work provides empirical evidence and a practical implementation pathway toward trustworthy, scalable, and equitable next-generation academic review paradigms.

Technology Category

Application Category

📝 Abstract

Peer review is the cornerstone of scientific publishing, yet it suffers from inconsistencies, reviewer subjectivity, and scalability challenges. We introduce ReviewerToo, a modular framework for studying and deploying AI-assisted peer review to complement human judgment with systematic and consistent assessments. ReviewerToo supports systematic experiments with specialized reviewer personas and structured evaluation criteria, and can be partially or fully integrated into real conference workflows. We validate ReviewerToo on a carefully curated dataset of 1,963 paper submissions from ICLR 2025, where our experiments with the gpt-oss-120b model achieves 81.8% accuracy for the task of categorizing a paper as accept/reject compared to 83.9% for the average human reviewer. Additionally, ReviewerToo-generated reviews are rated as higher quality than the human average by an LLM judge, though still trailing the strongest expert contributions. Our analysis highlights domains where AI reviewers excel (e.g., fact-checking, literature coverage) and where they struggle (e.g., assessing methodological novelty and theoretical contributions), underscoring the continued need for human expertise. Based on these findings, we propose guidelines for integrating AI into peer-review pipelines, showing how AI can enhance consistency, coverage, and fairness while leaving complex evaluative judgments to domain experts. Our work provides a foundation for systematic, hybrid peer-review systems that scale with the growth of scientific publishing.

Problem

Research questions and friction points this paper is trying to address.

Addressing inconsistencies and subjectivity in peer review

Developing AI framework to complement human reviewer judgment

Identifying domains where AI excels versus struggles in evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular AI framework for peer review assistance

Systematic experiments with reviewer personas and criteria

Partial or full integration into conference workflows

🔎 Similar Papers

The Great AI Witch Hunt: Reviewers Perception and (Mis)Conception of Generative AI in Research Writing

2024-06-27Computers in Human BehaviorCitations: 19

💼 Related Jobs

PhD GenAI Research Scientist Intern

Databricks

SF Bay Area Hourly Rate$54—$60 USD

San Francisco, CA, USA

Research Scientist Intern, Multimodal AI (PhD)