Paper Copilot: Tracking the Evolution of Peer Review in AI Conferences

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Rising AI conference submissions have strained peer review systems, exacerbating reviewer overload, poor topic matching, inconsistent evaluation criteria, superficial reviews, and weakened accountability under tight deadlines. Ad hoc policy interventions—though well-intentioned—further erode transparency and hinder systematic understanding of review practice evolution. To address this, we construct the first longitudinal, cross-conference digital archive of computer science top-tier conference peer review data, centered on ICLR. We propose a review evolution analysis framework integrating web crawling, NLP, and temporal analytics, and release an open-source, structured, multi-year ICLR review dataset. This enables the first reproducible empirical study of review quality, consistency, and temporal dynamics—revealing failure modes and evolutionary patterns—and provides foundational infrastructure and theoretical grounding for evidence-based peer review reform.

Technology Category

Application Category

📝 Abstract
The rapid growth of AI conferences is straining an already fragile peer-review system, leading to heavy reviewer workloads, expertise mismatches, inconsistent evaluation standards, superficial or templated reviews, and limited accountability under compressed timelines. In response, conference organizers have introduced new policies and interventions to preserve review standards. Yet these ad-hoc changes often create further concerns and confusion about the review process, leaving how papers are ultimately accepted - and how practices evolve across years - largely opaque. We present Paper Copilot, a system that creates durable digital archives of peer reviews across a wide range of computer-science venues, an open dataset that enables researchers to study peer review at scale, and a large-scale empirical analysis of ICLR reviews spanning multiple years. By releasing both the infrastructure and the dataset, Paper Copilot supports reproducible research on the evolution of peer review. We hope these resources help the community track changes, diagnose failure modes, and inform evidence-based improvements toward a more robust, transparent, and reliable peer-review system.
Problem

Research questions and friction points this paper is trying to address.

Tracking peer review evolution in AI conferences over time
Addressing reviewer workload and expertise mismatch issues
Analyzing inconsistent evaluation standards and review quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Creates durable digital archives of peer reviews
Provides open dataset for large-scale peer review analysis
Enables reproducible research on peer review evolution
🔎 Similar Papers