🤖 AI Summary
Novelty assessment in academic peer review remains a critical yet underexplored challenge. Method: This paper introduces the first large language model–based, structured novelty assessment framework. It emulates expert reviewer behavior via a three-stage automated pipeline: (1) structured extraction of submission content, (2) literature-aware retrieval and synthesis of related work, and (3) claim-level comparative reasoning—explicitly modeling independent claim verification and contextual inference. The method integrates analysis of large-scale human review corpora, literature-aware information extraction, and evidence-driven judgment techniques. Contribution/Results: Evaluated on 182 submissions to ICLR 2025, the framework achieves 86.5% alignment with human reviewers’ reasoning processes and 75.3% agreement on final novelty judgments—substantially outperforming existing baselines—while markedly improving assessment transparency and consistency.
📝 Abstract
Novelty assessment is a central yet understudied aspect of peer review, particularly in high volume fields like NLP where reviewer capacity is increasingly strained. We present a structured approach for automated novelty evaluation that models expert reviewer behavior through three stages: content extraction from submissions, retrieval and synthesis of related work, and structured comparison for evidence based assessment. Our method is informed by a large scale analysis of human written novelty reviews and captures key patterns such as independent claim verification and contextual reasoning. Evaluated on 182 ICLR 2025 submissions with human annotated reviewer novelty assessments, the approach achieves 86.5% alignment with human reasoning and 75.3% agreement on novelty conclusions - substantially outperforming existing LLM based baselines. The method produces detailed, literature aware analyses and improves consistency over ad hoc reviewer judgments. These results highlight the potential for structured LLM assisted approaches to support more rigorous and transparent peer review without displacing human expertise. Data and code are made available.