Large Language Models for Information Retrieval: A Survey

📅 2023-08-14
🏛️ ACM Transactions on Information Systems
📈 Citations: 355
Influential: 10
📄 PDF
🤖 AI Summary
Information retrieval (IR) faces persistent challenges including data scarcity, limited interpretability, and insufficient accuracy in generative response generation. To address these, this work systematically surveys recent advances in large language model (LLM)-enhanced IR, covering five core stages: query rewriting, retrieval, re-ranking, reading comprehension, and search agents. Methodologically, it integrates classical sparse retrieval (e.g., BM25), neural dense retrieval (e.g., DPR, ANCE), LLM-driven rewriting and re-ranking, generative reading comprehension, and chain-of-thought–enabled search agents. The paper makes three key contributions: (1) the first comprehensive, full-stack technical taxonomy of LLM-IR; (2) a unified classification framework for LLM-augmented IR techniques; and (3) identification of a novel co-evolutionary paradigm—where sparse retrieval efficiency and neural semantic understanding mutually reinforce one another. Synthesizing over 100 studies, it clarifies critical bottlenecks and open problems, delivering a foundational, theoretically grounded yet practically actionable roadmap for LLM-enhanced IR.
📝 Abstract
As a primary means of information acquisition, information retrieval (IR) systems, such as search engines, have integrated themselves into our daily lives. These systems also serve as components of dialogue, question-answering, and recommender systems. The trajectory of IR has evolved dynamically from its origins in term-based methods to its integration with advanced neural models. While the neural models excel at capturing complex contextual signals and semantic nuances, they still face challenges such as data scarcity, interpretability, and the generation of contextually plausible yet potentially inaccurate responses. This evolution requires a combination of traditional methods (such as term-based sparse retrieval methods with rapid response) and modern neural architectures (such as language models with powerful language understanding capacity). Meanwhile, the emergence of large language models (LLMs) has revolutionized natural language processing due to their remarkable language understanding, generation, and reasoning abilities. Consequently, recent research has sought to leverage LLMs to improve IR systems. Given the rapid evolution of this research trajectory, it is necessary to consolidate existing methodologies and provide nuanced insights through a comprehensive overview. In this survey, we delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, readers, and search agents.
Problem

Research questions and friction points this paper is trying to address.

Leveraging large language models to improve information retrieval systems
Addressing challenges like data scarcity and response accuracy in IR
Surveying integration of LLMs in query, retrieval, and ranking components
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging large language models for IR
Combining traditional and neural retrieval methods
Integrating LLMs in query and retrieval components
🔎 Similar Papers
No similar papers found.
Y
Yutao Zhu
Gaoling School of Artificial Intelligence and School of Information, Renmin University of China
H
Huaying Yuan
Gaoling School of Artificial Intelligence and School of Information, Renmin University of China
Shuting Wang
Shuting Wang
Gaoling School of Artificial Intelligence, Renmin University of China
Information RetrievalRetrieval-augmented Generation
Jiongnan Liu
Jiongnan Liu
Gaoling School of Artificial Intelligence, Renmin University of China
Information Retrieval
Wenhan Liu
Wenhan Liu
Gaoling School of Artificial Intelligence, Renmin University of China
Information RetrievalLarge Language Models
C
Chenlong Deng
Gaoling School of Artificial Intelligence and School of Information, Renmin University of China
Zhicheng Dou
Zhicheng Dou
Renmin University of China
Information RetrievalRetrieval Augmented GenerationLarge Language ModelsGenerative IR
J
Ji-rong Wen
Gaoling School of Artificial Intelligence and School of Information, Renmin University of China