PaSa: An LLM Agent for Comprehensive Academic Paper Search

📅 2025-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current academic search tools exhibit insufficient accuracy and comprehensiveness for complex queries. To address this, we propose PaSa—the first autonomous decision-making large language model agent tailored for scholarly literature retrieval—featuring a novel reinforcement learning–based multi-stage retrieval agent architecture that supports automatic tool invocation, paper reading, and citation filtering. To mitigate the scarcity of annotated data, we introduce two benchmarks: AutoScholarQuery, a synthetically generated dataset, and RealScholarQuery, a real-world query benchmark. Evaluated on RealScholarQuery, PaSa-7B significantly outperforms strong baselines: it achieves +37.78% and +39.90% absolute gains in recall@20 and recall@50 over Google+GPT-4o, respectively, and improves recall by 30.36% and precision by 4.25% over PaSa-GPT-4o. This work establishes a new paradigm for academic search agents and provides reproducible, standardized evaluation benchmarks.

Technology Category

Application Category

📝 Abstract
We introduce PaSa, an advanced Paper Search agent powered by large language models. PaSa can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries. We optimize PaSa using reinforcement learning with a synthetic dataset, AutoScholarQuery, which includes 35k fine-grained academic queries and corresponding papers sourced from top-tier AI conference publications. Additionally, we develop RealScholarQuery, a benchmark collecting real-world academic queries to assess PaSa performance in more realistic scenarios. Despite being trained on synthetic data, PaSa significantly outperforms existing baselines on RealScholarQuery, including Google, Google Scholar, Google with GPT-4 for paraphrased queries, chatGPT (search-enabled GPT-4o), GPT-o1, and PaSa-GPT-4o (PaSa implemented by prompting GPT-4o). Notably, PaSa-7B surpasses the best Google-based baseline, Google with GPT-4o, by 37.78% in recall@20 and 39.90% in recall@50. It also exceeds PaSa-GPT-4o by 30.36% in recall and 4.25% in precision. Model, datasets, and code are available at https://github.com/bytedance/pasa.
Problem

Research questions and friction points this paper is trying to address.

Academic Paper Search
Accuracy
Comprehensiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
AutoScholarQuery Dataset
Performance Improvement
🔎 Similar Papers
No similar papers found.
Yichen He
Yichen He
Bytedance Research
G
Guanhua Huang
ByteDance Research
P
Peiyuan Feng
ByteDance Research
Yuan Lin
Yuan Lin
Ocean College, Zhejiang University
RheologyPolymer physcisMulti-phase flow
Y
Yuchen Zhang
ByteDance Research
H
Hang Li
ByteDance Research
E
E. Weinan
Peking University