Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025

📅 2025-04-13

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Peer review at top AI conferences faces declining quality and growing author dissatisfaction due to surging submission volumes. This paper introduces Review Feedback Agent, the first system to empirically validate LLM-augmented reviewing via a large-scale randomized controlled trial. It employs multi-LLM collaboration for feedback generation, LLM-based self-auditing for reliability assessment, and an automated test suite to deliver real-time, trustworthy, and actionable revision suggestions to reviewers. Blinded evaluation shows that 27% of reviewers who adopted AI-generated feedback revised their reviews—adding on average 80 words—with significant improvements in informativeness and technical rigor. Over 12,000 AI-generated suggestions were incorporated into final reviews, and rebuttal discussion duration increased. This work establishes the first empirical paradigm for LLM-enhanced peer review and provides both a methodological framework and a system-level solution for trustworthy AI-assisted scholarly evaluation.

Technology Category

Application Category

📝 Abstract

Peer review at AI conferences is stressed by rapidly rising submission volumes, leading to deteriorating review quality and increased author dissatisfaction. To address these issues, we developed Review Feedback Agent, a system leveraging multiple large language models (LLMs) to improve review clarity and actionability by providing automated feedback on vague comments, content misunderstandings, and unprofessional remarks to reviewers. Implemented at ICLR 2025 as a large randomized control study, our system provided optional feedback to more than 20,000 randomly selected reviews. To ensure high-quality feedback for reviewers at this scale, we also developed a suite of automated reliability tests powered by LLMs that acted as guardrails to ensure feedback quality, with feedback only being sent to reviewers if it passed all the tests. The results show that 27% of reviewers who received feedback updated their reviews, and over 12,000 feedback suggestions from the agent were incorporated by those reviewers. This suggests that many reviewers found the AI-generated feedback sufficiently helpful to merit updating their reviews. Incorporating AI feedback led to significantly longer reviews (an average increase of 80 words among those who updated after receiving feedback) and more informative reviews, as evaluated by blinded researchers. Moreover, reviewers who were selected to receive AI feedback were also more engaged during paper rebuttals, as seen in longer author-reviewer discussions. This work demonstrates that carefully designed LLM-generated review feedback can enhance peer review quality by making reviews more specific and actionable while increasing engagement between reviewers and authors. The Review Feedback Agent is publicly available at https://github.com/zou-group/review_feedback_agent.

Problem

Research questions and friction points this paper is trying to address.

Improving review clarity and actionability using LLM feedback

Addressing deteriorating review quality in AI conferences

Enhancing reviewer engagement and review informativeness

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs provide automated feedback on reviews

Automated reliability tests ensure feedback quality

AI feedback increases review specificity and engagement

🔎 Similar Papers

No similar papers found.