๐ค AI Summary
This work addresses the challenges of outdated benchmarks and inaccurate proxy signals in reviewer assignment during the era of large language models. We propose RATE, a framework that constructs reviewer expertise profiles from recent publication keywords and leverages heuristic retrieval signals to generate weak preference supervision, enabling annotation-free reviewerโpaper matching and ranking. Additionally, we introduce LR-bench, the first high-quality, timely benchmark dataset based on self-assessed reviewer familiarity. Evaluated on both LR-bench and the CMU gold standard, RATE significantly outperforms strong embedding-based baselines, achieving state-of-the-art performance. The code and dataset are publicly released.
๐ Abstract
Reviewer assignment is increasingly critical yet challenging in the LLM era, where rapid topic shifts render many pre-2023 benchmarks outdated and where proxy signals poorly reflect true reviewer familiarity. We address this evaluation bottleneck by introducing LR-bench, a high-fidelity, up-to-date benchmark curated from 2024-2025 AI/NLP manuscripts with five-level self-assessed familiarity ratings collected via a large-scale email survey, yielding 1055 expert-annotated paper-reviewer-score annotations. We further propose RATE, a reviewer-centric ranking framework that distills each reviewer's recent publications into compact keyword-based profiles and fine-tunes an embedding model with weak preference supervision constructed from heuristic retrieval signals, enabling matching each manuscript against a reviewer profile directly. Across LR-bench and the CMU gold-standard dataset, our approach consistently achieves state-of-the-art performance, outperforming strong embedding baselines by a clear margin. We release LR-bench at https://huggingface.co/datasets/Gnociew/LR-bench, and a GitHub repository at https://github.com/Gnociew/RATE-Reviewer-Assign.