🤖 AI Summary
To address the challenges of long-text modeling and poor interpretability in automated academic paper peer review, this paper proposes a graph-structured document representation method: papers are modeled as semantic graphs, and cross-paragraph key units are adaptively extracted based on semantic connectivity and paragraph-level contribution. A collaborative GNN-LLM architecture is then introduced to generate review comments while enabling multi-task transfer (e.g., question answering and summarization). This approach alleviates reliance on excessively long LLM inputs, significantly improving both computational efficiency and modeling accuracy. Empirical evaluation demonstrates an average 58.72% improvement over state-of-the-art baselines across multiple metrics. Notably, the method achieves substantial gains in review quality, consistency, and interpretability—enabling transparent, fine-grained justification of review decisions through the learned graph structure.
📝 Abstract
Generating a review for an academic research paper is a complex task that requires a deep understanding of the document's content and the interdependencies between its sections. It demands not only insight into technical details but also an appreciation of the paper's overall coherence and structure. Recent methods have predominantly focused on fine-tuning large language models (LLMs) to address this challenge. However, they often overlook the computational and performance limitations imposed by long input token lengths. To address this, we introduce AutoRev, an Automatic Peer Review System for Academic Research Papers. Our novel framework represents an academic document as a graph, enabling the extraction of the most critical passages that contribute significantly to the review. This graph-based approach demonstrates effectiveness for review generation and is potentially adaptable to various downstream tasks, such as question answering, summarization, and document representation. When applied to review generation, our method outperforms SOTA baselines by an average of 58.72% across all evaluation metrics. We hope that our work will stimulate further research in applying graph-based extraction techniques to other downstream tasks in NLP. We plan to make our code public upon acceptance.