Towards Self-Evolving Agentic Literature Retrieval

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the limitations of traditional keyword-based retrieval in capturing complex scholarly intent and the high computational cost and hallucination risks associated with large language models in literature search. To overcome these challenges, the authors propose PaSaMaster, a novel system that introduces the first “self-evolving intelligent体检索” paradigm—decoupling intent understanding from large-scale retrieval. By integrating iterative intent analysis, evidence-driven relevance scoring, and lightweight modeling, PaSaMaster achieves highly efficient and accurate retrieval. Evaluated across 38 academic disciplines, the system improves F1 scores by 15.6× over conventional methods and outperforms GPT-5.2 by 30.0%, while reducing computational cost to just 1% and eliminating source hallucinations entirely, thereby achieving an exceptional balance among accuracy, efficiency, and reliability.

📝 Abstract

As large language models reshape scientific research, literature retrieval faces a twofold challenge: ensuring source authenticity while maintaining a deep comprehension of academic search intents. While reliable, traditional keyword-centric search fails to capture complex research intents. Frontier LLMs can handle complex research intents, but their high cost and tendency to hallucinate remain key limitations. Here we introduce PaSaMaster, a self-evolving agentic literature retrieval system that produces relevance-scored paper rankings with evidence-grounded recommendations through iterative intent analysis, retrieval, and ranking. It is built on three key designs. First, it transforms literature retrieval from a one shot query--document matching problem into a search process that evolves over time, using ranked evidence to reveal gaps, refine intents, and guide follow-up searches. Second, it prevents hallucinated sources by treating retrieval as intent--paper relevance ranking rather than generation. Finally, PaSaMaster improves cost efficiency by separating planning from retrieval: a frontier LLM is used only for intent understanding, while large scale retrieval and relevance scoring are delegated to customized corpora and lightweight models. Evaluated on the PaSaMaster Benchmark across 38 scientific disciplines, our system exposes the severe inaccuracy and incompleteness of traditional keyword retrieval (improving F1-score by 15.6X) and the unreliability of generative LLMs (which exhibit hallucination rates up to 37.79%). Remarkably, PaSaMaster outperforms GPT-5.2 by 30.0% at a mere 1% of the computational cost while ensuring zero source hallucination: https://github.com/sjtu-sai-agents/PaSaMaster

Problem

Research questions and friction points this paper is trying to address.

literature retrieval

source authenticity

research intent comprehension

hallucination

keyword-centric search

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-evolving retrieval

evidence-grounded recommendation

intent-aware ranking