🤖 AI Summary
This work proposes a large language model (LLM)-based approach to optimize reviewer allocation in major machine learning conferences, where current practices often rely on random assignment or affinity-based heuristics that struggle to accurately identify papers near the acceptance threshold. By leveraging LLMs to perform pairwise comparisons of submissions and integrating these judgments into a Bradley–Terry model, the method constructs a ranking that predicts the borderline paper band without requiring human reviews. The authors further introduce an “expected impact” metric that jointly considers the overlap (ρ) between predicted and true borderline regions and the marginal decision value (Δ) gained from additional reviews. This metric guides the targeted allocation of limited extra reviewing resources to papers most likely to have their acceptance outcomes altered by additional scrutiny. Retrospective experiments demonstrate the effectiveness of the proposed framework.
📝 Abstract
This paper argues that large ML conferences should allocate marginal review capacity primarily to papers near the acceptance boundary, rather than spreading extra reviews via random or affinity-driven heuristics. We propose using LLM-based comparative ranking (via pairwise comparisons and a Bradley--Terry model) to identify a borderline band \emph{before} human reviewing and to allocate \emph{marginal} reviewer capacity at assignment time. Concretely, given a venue-specific minimum review target (e.g., 3 or 4), we use this signal to decide which papers receive one additional review (e.g., a 4th or 5th), without conditioning on any human reviews and without using LLM outputs for accept/reject. We provide a simple expected-impact calculation in terms of (i) the overlap between the predicted and true borderline sets ($\rho$) and (ii) the incremental value of an extra review near the boundary ($\Delta$), and we provide retrospective proxies to estimate these quantities.