NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing paper quality assessment methods suffer from high inference costs (LLMs) or inconsistent rating scales (regression models). This work proposes a lightweight, unbiased ranking framework based on domain- and year-aware debiased pairwise learning. We introduce Review Tendency Signals (RTS), a supervision signal that jointly encodes reviewer scores and confidence. Our method integrates probabilistic signal aggregation with structured metadata modeling, trained on our large-scale, self-constructed dataset NAIDv2. The resulting model achieves linear-time inference complexity while maintaining strong generalization. On the ICLR test set, it attains 78.2% AUC and 0.432 Spearman correlation—state-of-the-art performance. It sustains robust accuracy on unseen NeurIPS submissions and guarantees strict monotonicity of predicted scores with respect to acceptance decisions (e.g., reject < weak accept < accept < strong accept).

Technology Category

Application Category

📝 Abstract

The ability to estimate the quality of scientific papers is central to how both humans and AI systems will advance scientific knowledge in the future. However, existing LLM-based estimation methods suffer from high inference cost, whereas the faster direct score regression approach is limited by scale inconsistencies. We present NAIPv2, a debiased and efficient framework for paper quality estimation. NAIPv2 employs pairwise learning within domain-year groups to reduce inconsistencies in reviewer ratings and introduces the Review Tendency Signal (RTS) as a probabilistic integration of reviewer scores and confidences. To support training and evaluation, we further construct NAIDv2, a large-scale dataset of 24,276 ICLR submissions enriched with metadata and detailed structured content. Trained on pairwise comparisons but enabling efficient pointwise prediction at deployment, NAIPv2 achieves state-of-the-art performance (78.2% AUC, 0.432 Spearman), while maintaining scalable, linear-time efficiency at inference. Notably, on unseen NeurIPS submissions, it further demonstrates strong generalization, with predicted scores increasing consistently across decision categories from Rejected to Oral. These findings establish NAIPv2 as a debiased and scalable framework for automated paper quality estimation, marking a step toward future scientific intelligence systems. Code and dataset are released at https://sway.cloud.microsoft/Pr42npP80MfPhvj8.

Problem

Research questions and friction points this paper is trying to address.

Addresses high inference cost in LLM-based quality estimation

Solves scale inconsistencies in direct score regression methods

Reduces biases in reviewer ratings through pairwise learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pairwise learning within domain-year groups

Review Tendency Signal probabilistic score integration

Trained pairwise but enables efficient pointwise prediction

🔎 Similar Papers

Boosting CLIP Adaptation for Image Quality Assessment via Meta-Prompt Learning and Gradient Regularization