TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-based peer review methods struggle to simultaneously achieve depth, efficiency, and interpretability. To address this, we propose a dynamic hierarchical question-answering framework that models review as recursively constructing and bottom-up solving a problem tree: hierarchical question decomposition, LLM-driven on-demand question generation, bidirectional tree-structured reasoning, and answer aggregation jointly enable controllable depth and full process transparency. We introduce the first benchmark for review-oriented evaluation—ICLR/NeurIPS ReviewBench—and design a dynamic question expansion mechanism that reduces token consumption by 80% without compromising review quality. Extensive experiments under both human and LLM evaluation demonstrate that our method significantly outperforms strong baselines: it achieves higher expert agreement, yields more comprehensive and insightful review comments, and exhibits superior robustness and fidelity across diverse paper domains.

Technology Category

Application Category

📝 Abstract
While Large Language Models (LLMs) have shown significant potential in assisting peer review, current methods often struggle to generate thorough and insightful reviews while maintaining efficiency. In this paper, we propose TreeReview, a novel framework that models paper review as a hierarchical and bidirectional question-answering process. TreeReview first constructs a tree of review questions by recursively decomposing high-level questions into fine-grained sub-questions and then resolves the question tree by iteratively aggregating answers from leaf to root to get the final review. Crucially, we incorporate a dynamic question expansion mechanism to enable deeper probing by generating follow-up questions when needed. We construct a benchmark derived from ICLR and NeurIPS venues to evaluate our method on full review generation and actionable feedback comments generation tasks. Experimental results of both LLM-based and human evaluation show that TreeReview outperforms strong baselines in providing comprehensive, in-depth, and expert-aligned review feedback, while reducing LLM token usage by up to 80% compared to computationally intensive approaches. Our code and benchmark dataset are available at https://github.com/YuanChang98/tree-review.
Problem

Research questions and friction points this paper is trying to address.

Enhances thoroughness and efficiency of LLM-based peer review
Models review as hierarchical bidirectional question-answering process
Reduces token usage while improving review depth and quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical and bidirectional question-answering process
Dynamic question expansion for deeper probing
Efficient token usage with up to 80% reduction
🔎 Similar Papers
No similar papers found.
Yuan Chang
Yuan Chang
Cancer Virology Program, University of Pittsburgh
Cancer Virology
Ziyue Li
Ziyue Li
CS PhD, University of Maryland
Machine learning
Hengyuan Zhang
Hengyuan Zhang
Ph.D. Student, University of California San Diego
RoboticsComputer VisionAutonomous VehiclesSensor Fusion
Y
Yuanbo Kong
National Science Library, Chinese Academy of Sciences; Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences
Y
Yanru Wu
Tsinghua University
Zhijiang Guo
Zhijiang Guo
HKUST (GZ) | HKUST
Natural Language ProcessingMachine LearningLarge Language Models
N
Ngai Wong
The University of Hong Kong