TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Current LLM-based peer review methods struggle to simultaneously achieve depth, efficiency, and interpretability. To address this, we propose a dynamic hierarchical question-answering framework that models review as recursively constructing and bottom-up solving a problem tree: hierarchical question decomposition, LLM-driven on-demand question generation, bidirectional tree-structured reasoning, and answer aggregation jointly enable controllable depth and full process transparency. We introduce the first benchmark for review-oriented evaluation—ICLR/NeurIPS ReviewBench—and design a dynamic question expansion mechanism that reduces token consumption by 80% without compromising review quality. Extensive experiments under both human and LLM evaluation demonstrate that our method significantly outperforms strong baselines: it achieves higher expert agreement, yields more comprehensive and insightful review comments, and exhibits superior robustness and fidelity across diverse paper domains.

Technology Category

Application Category

📝 Abstract

While Large Language Models (LLMs) have shown significant potential in assisting peer review, current methods often struggle to generate thorough and insightful reviews while maintaining efficiency. In this paper, we propose TreeReview, a novel framework that models paper review as a hierarchical and bidirectional question-answering process. TreeReview first constructs a tree of review questions by recursively decomposing high-level questions into fine-grained sub-questions and then resolves the question tree by iteratively aggregating answers from leaf to root to get the final review. Crucially, we incorporate a dynamic question expansion mechanism to enable deeper probing by generating follow-up questions when needed. We construct a benchmark derived from ICLR and NeurIPS venues to evaluate our method on full review generation and actionable feedback comments generation tasks. Experimental results of both LLM-based and human evaluation show that TreeReview outperforms strong baselines in providing comprehensive, in-depth, and expert-aligned review feedback, while reducing LLM token usage by up to 80% compared to computationally intensive approaches. Our code and benchmark dataset are available at https://github.com/YuanChang98/tree-review.

Problem

Research questions and friction points this paper is trying to address.

Enhances thoroughness and efficiency of LLM-based peer review

Models review as hierarchical bidirectional question-answering process

Reduces token usage while improving review depth and quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical and bidirectional question-answering process

Dynamic question expansion for deeper probing

Efficient token usage with up to 80% reduction

🔎 Similar Papers

LLMs as Meta-Reviewers' Assistants: A Case Study