🤖 AI Summary
This paper addresses the low efficiency and bias susceptibility in academic paper peer-reviewer assignment by formulating it as an unsupervised retrieval task. We construct exHarmony—the first standardized, reproducible, and annotation-free benchmark dataset—based on OpenAlex. Our method integrates multi-source representations: author metadata, citation networks, and expert co-occurrence relations, and jointly leverages lexical matching, static embeddings (Word2Vec), and domain-adapted contextual embeddings (SciBERT fine-tuned on scholarly literature). We further propose the first evaluation metrics explicitly balancing relevance and diversity. Experiments demonstrate that our approach significantly outperforms all baselines across all metrics. Ablation studies confirm the critical roles of citation links and author co-occurrence patterns in identifying suitable reviewers. The framework provides a scalable, principled foundation for intelligent reviewer recommendation systems.
📝 Abstract
The peer review process is crucial for ensuring the quality and reliability of scholarly work, yet assigning suitable reviewers remains a significant challenge. Traditional manual methods are labor-intensive and often ineffective, leading to nonconstructive or biased reviews. This paper introduces the exHarmony (eHarmony but for connecting experts to manuscripts) benchmark, designed to address these challenges by re-imagining the Reviewer Assignment Problem (RAP) as a retrieval task. Utilizing the extensive data from OpenAlex, we propose a novel approach that considers a host of signals from the authors, most similar experts, and the citation relations as potential indicators for a suitable reviewer for a manuscript. This approach allows us to develop a standard benchmark dataset for evaluating the reviewer assignment problem without needing explicit labels. We benchmark various methods, including traditional lexical matching, static neural embeddings, and contextualized neural embeddings, and introduce evaluation metrics that assess both relevance and diversity in the context of RAP. Our results indicate that while traditional methods perform reasonably well, contextualized embeddings trained on scholarly literature show the best performance. The findings underscore the importance of further research to enhance the diversity and effectiveness of reviewer assignments.