Generative Retrieval Overcomes Limitations of Dense Retrieval but Struggles with Identifier Ambiguity

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the theoretical limitations of traditional dense retrieval and the vulnerability of generative retrieval under document identifier ambiguity. For the first time, it systematically evaluates the potential of generative retrieval on the synthetic dataset LIMIT, employing SEAL and MINDER models with BM25 and dense retrieval as baselines. Results show that on the original LIMIT dataset, generative approaches achieve R@2 scores of 0.92–0.99, substantially outperforming dense retrieval (<0.03) and BM25 (0.86). However, when hard negatives are introduced, performance sharply drops to 0.51, revealing a critical bottleneck: the decoding mechanism struggles to generate unique identifiers reliably. This work not only confirms the superiority of generative retrieval under ideal conditions but also, through error analysis, identifies identifier ambiguity as a key challenge limiting its robustness.
📝 Abstract
While dense retrieval models, which embed queries and documents into a shared low-dimensional space, have gained widespread popu- larity, they were shown to exhibit important theoretical limitations and considerably lag behind traditional sparse retrieval models in certain settings. Generative retrieval has emerged as an alternative approach to dense retrieval by using a language model to predict query-document relevance directly. In this paper, we demonstrate strengths and weaknesses of generative retrieval approaches us- ing a simple synthetic dataset, called LIMIT, that was previously introduced to empirically demonstrate the theoretical limitations of embedding-based retrieval but was not used to evaluate genera- tive retrieval. We close this research gap and show that generative retrieval achieves the best performance on this dataset without any additional training required (0.92 and 0.99 R@2 for SEAL and MINDER, respectively), compared to dense approaches (< 0.03 Re- call@2) and BM25 (0.86 R@2). However, we then proceed to extend the original LIMIT dataset by adding simple hard negative samples and observe the performance degrading for all the models including the generative retrieval models (0.51 R@2) as well as BM25 (0.21 R@2). Error analysis identifies a failure in the decoding mechanism, caused by the inability to produce identifiers that are unique to relevant documents. Future generative retrieval must address these issues, either by designing identifiers that are more suitable to the decoding process or by adapting decoding and scoring algorithms to preserve relevance signals.
Problem

Research questions and friction points this paper is trying to address.

Generative Retrieval
Identifier Ambiguity
Dense Retrieval
Document Identification
Retrieval Failure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Retrieval
Dense Retrieval
Identifier Ambiguity
LIMIT Dataset
Hard Negative Mining
🔎 Similar Papers
No similar papers found.
A
Adrian Bracher
Vienna University of Economics and Business, Vienna, Austria
Svitlana Vakulenko
Svitlana Vakulenko
Vienna University of Economics and Business
RAGConversational SearchInformation Seeking