PaSa: An LLM Agent for Comprehensive Academic Paper Search

📅 2025-01-17

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Current academic search tools exhibit insufficient accuracy and comprehensiveness for complex queries. To address this, we propose PaSa—the first autonomous decision-making large language model agent tailored for scholarly literature retrieval—featuring a novel reinforcement learning–based multi-stage retrieval agent architecture that supports automatic tool invocation, paper reading, and citation filtering. To mitigate the scarcity of annotated data, we introduce two benchmarks: AutoScholarQuery, a synthetically generated dataset, and RealScholarQuery, a real-world query benchmark. Evaluated on RealScholarQuery, PaSa-7B significantly outperforms strong baselines: it achieves +37.78% and +39.90% absolute gains in recall@20 and recall@50 over Google+GPT-4o, respectively, and improves recall by 30.36% and precision by 4.25% over PaSa-GPT-4o. This work establishes a new paradigm for academic search agents and provides reproducible, standardized evaluation benchmarks.

Technology Category

Application Category

📝 Abstract

We introduce PaSa, an advanced Paper Search agent powered by large language models. PaSa can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries. We optimize PaSa using reinforcement learning with a synthetic dataset, AutoScholarQuery, which includes 35k fine-grained academic queries and corresponding papers sourced from top-tier AI conference publications. Additionally, we develop RealScholarQuery, a benchmark collecting real-world academic queries to assess PaSa performance in more realistic scenarios. Despite being trained on synthetic data, PaSa significantly outperforms existing baselines on RealScholarQuery, including Google, Google Scholar, Google with GPT-4 for paraphrased queries, chatGPT (search-enabled GPT-4o), GPT-o1, and PaSa-GPT-4o (PaSa implemented by prompting GPT-4o). Notably, PaSa-7B surpasses the best Google-based baseline, Google with GPT-4o, by 37.78% in recall@20 and 39.90% in recall@50. It also exceeds PaSa-GPT-4o by 30.36% in recall and 4.25% in precision. Model, datasets, and code are available at https://github.com/bytedance/pasa.

Problem

Research questions and friction points this paper is trying to address.

Academic Paper Search

Accuracy

Comprehensiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning

AutoScholarQuery Dataset

Performance Improvement

🔎 Similar Papers

ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents