Modeling LLM Agent Reviewer Dynamics in Elo-Ranked Review System

📅 2026-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the behavioral dynamics of large language model (LLM) agents within an Elo-based peer review system and their impact on decision quality. By constructing a multi-round interactive simulation environment, the authors introduce LLM reviewers with diverse personas that interact with area chairs, comparing settings with and without the Elo scoring mechanism and reviewer memory. The work presents the first integration of the Elo system into LLM-agent peer review, revealing that LLMs strategically optimize their scores rather than increase reviewing effort. Experiments grounded in real conference submission data demonstrate that incorporating the Elo mechanism significantly improves the decision accuracy of area chairs, highlighting the role of incentive-aligned scoring systems in shaping agent behavior.

Technology Category

Application Category

📝 Abstract
In this work, we explore the Large Language Model (LLM) agent reviewer dynamics in an Elo-ranked review system using real-world conference paper submissions. Multiple LLM agent reviewers with different personas are engage in multi round review interactions moderated by an Area Chair. We compare a baseline setting with conditions that incorporate Elo ratings and reviewer memory. Our simulation results showcase several interesting findings, including how incorporating Elo improves Area Chair decision accuracy, as well as reviewers'adaptive review strategy that exploits our Elo system without improving review effort. Our code is available at https://github.com/hsiangwei0903/EloReview.
Problem

Research questions and friction points this paper is trying to address.

LLM agent
reviewer dynamics
Elo-ranked system
peer review
multi-round interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Elo rating
LLM agent
peer review dynamics
reviewer memory
simulation framework
🔎 Similar Papers
No similar papers found.