🤖 AI Summary
This work addresses the current lack of a systematic integrative framework in reasoning-intensive retrieval (RIR) research. It proposes the first structured taxonomy that organizes existing RIR benchmarks according to knowledge domains and modalities, and establishes a methodological framework elucidating how reasoning capabilities are integrated into the retrieval pipeline and the associated trade-offs. By synergistically combining the reasoning power of large language models with conventional retrievers and re-rankers, the study systematically evaluates performance across multimodal and multidomain scenarios. This effort not only unifies the fragmented landscape of RIR research but also constructs a comprehensive research map, identifies critical challenges, and provides a clear roadmap for future exploration.
📝 Abstract
Reasoning-Intensive Retrieval (RIR) targets retrieval settings where relevance is mediated by latent inferential links between a query and supporting evidence, rather than semantic similarity. Motivated by the emergent reasoning abilities of Large Language Models (LLMs), recent work integrates these capabilities into the IR field, spanning the entire pipeline from benchmarks to retrievers and rerankers. Despite this progress, the field lacks a systematic framework to organize current efforts and articulate a clear path forward. To provide a clear roadmap for this rapidly growing yet fragmented area, this survey (1) systematizes existing RIR benchmarks by knowledge domains and modalities, providing a detailed analysis of the current landscape; (2) introduces a structured taxonomy that categorizes methods based on where and how reasoning is integrated into the retrieval pipeline, alongside an analysis of their trade-offs and practical applications; and (3) summarizes challenges and future directions to guide research in this evolving field.