Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study investigates whether a fine-tuned lexical retriever (BM25) remains sufficient to support effective deep research in the era of large language models (LLMs) with strong reasoning and tool-use capabilities. To this end, we introduce Pi-Serini, an agent that integrates retrieval, web browsing, and reading tools, combining a state-of-the-art LLM (e.g., GPT-5.5) with a deeply optimized BM25 retriever. Evaluated on the BrowseComp-Plus dataset, our system achieves 83.1% answer accuracy and 94.7% evidence recall. The results demonstrate that high-performance LLMs coupled with a refined BM25 can surpass existing dense-retrieval-based systems, challenging prevailing assumptions about the necessity of complex retrieval architectures and reaffirming the potential of lexical retrieval in deep research scenarios.

📝 Abstract

Does a lexical retriever suffice as large language models (LLMs) become more capable in an agentic loop? This question naturally arises when building deep research systems. We revisit it by pairing BM25 with frontier LLMs that have better reasoning and tool-use abilities. To support researchers asking the same question, we introduce Pi-Serini, a search agent equipped with three tools for retrieving, browsing, and reading documents. Our results show that, on BrowseComp-Plus, a well-configured lexical retriever with sufficient retrieval depth can support effective deep research when paired with more capable LLMs. Specifically, Pi-Serini with gpt-5.5 achieves 83.1% answer accuracy and 94.7% surfaced evidence recall, outperforming released search agents that use dense retrievers. Controlled ablations further show that BM25 tuning improves answer accuracy by 18.0% and surfaced evidence recall by 11.1% over the default BM25 setting, while increasing retrieval depth further improves surfaced evidence recall by 25.3% over the shallow-retrieval setting. Source code is available at https://github.com/justram/pi-serini.

Problem

Research questions and friction points this paper is trying to address.

agentic search

lexical retrieval

large language models

deep research

BM25

Innovation

Methods, ideas, or system contributions that make the work stand out.

lexical retrieval

agentic search

BM25 tuning