PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges posed by the exponential growth of scientific literature and the inefficiency of existing large language model–based paper-reading agents, which often suffer from redundant exploration and poor planning. To this end, we propose PaperCompass, a novel framework that decouples high-level planning from fine-grained execution by first generating an abstract action sequence and then progressively instantiating function call parameters. Inspired by cognitive science, we introduce DFPO, a lightweight hierarchical reinforcement learning algorithm, together with a new Draft-and-Follow policy optimization strategy that effectively bridges the “knowing-doing gap” in smaller models. Evaluated on the Paper-QA benchmark, our approach achieves significantly improved reasoning efficiency, matching the performance of substantially larger models while maintaining stable and reliable training dynamics.

Technology Category

Application Category

📝 Abstract
The accelerating growth of the scientific literature makes it increasingly difficult for researchers to track new advances through manual reading alone. Recent progress in large language models (LLMs) has therefore spurred interest in autonomous agents that can read scientific papers and extract task-relevant information. However, most existing approaches rely either on heavily engineered prompting or on a conventional SFT-RL training pipeline, both of which often lead to excessive and low-yield exploration. Drawing inspiration from cognitive science, we propose PaperCompass, a framework that mitigates these issues by separating high-level planning from fine-grained execution. PaperCompass first drafts an explicit plan that outlines the intended sequence of actions, and then performs detailed reasoning to instantiate each step by selecting the parameters for the corresponding function calls. To train such behavior, we introduce Draft-and-Follow Policy Optimization (DFPO), a tailored RL method that jointly optimizes both the draft plan and the final solution. DFPO can be viewed as a lightweight form of hierarchical reinforcement learning, aimed at narrowing the `knowing-doing'gap in LLMs. We provide a theoretical analysis that establishes DFPO's favorable optimization properties, supporting a stable and reliable training process. Experiments on paper-based question answering (Paper-QA) benchmarks show that PaperCompass improves efficiency over strong baselines without sacrificing performance, achieving results comparable to much larger models.
Problem

Research questions and friction points this paper is trying to address.

scientific literature
paper-reading agents
large language models
efficiency
autonomous agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

PaperCompass
Draft-and-Follow Policy Optimization
hierarchical reinforcement learning
small language models
scientific paper reading
🔎 Similar Papers
No similar papers found.
Zijian Wang
Zijian Wang
China University of Petroleum(East China)
RLLLMNLP
Tiancheng Huang
Tiancheng Huang
Nanyang Technological University
Deep LearningGraph Neural NetworkLiDAR3D Point Cloud
H
Hanqi Li
X-LANCE Lab, School of Computer Science, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai, China; Suzhou Laboratory, Suzhou, China
Da Ma
Da Ma
Assistant Professor, School of Medicine, Wake Forest University
Medical Image ComputingComputational NeuroanatomyRadiogenomicsNeurodegenerative Disease
Lu Chen
Lu Chen
School of Computer Science, Shanghai Jiao Tong University
Large Language ModelsDialogue SystemsAI for Science
K
Kai Yu
X-LANCE Lab, School of Computer Science, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai, China; Suzhou Laboratory, Suzhou, China