Lost in Decoding? Reproducing and Stress-Testing the Look-Ahead Prior in Generative Retrieval

πŸ“… 2026-04-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

185K/year
πŸ€– AI Summary
This work addresses the vulnerability of generative retrieval under limited beam-width decoding, where relevant documents are often missed due to premature prefix pruning and planning signals that are highly sensitive to query perturbations. The study reproduces and stress-tests the PAG method, which enhances robustness by computing document-level look-ahead priors through synchronized decoding to guide sequence generation. It introduces, for the first time, a diagnostic tool termed β€œplan drift” to systematically evaluate the impact of surface-form variations in queries on planning stability. Furthermore, the paper explores reindexing-free mitigation strategies for cross-lingual retrieval and query spelling errors. Experiments confirm that PAG is effectively reproducible on MS MARCO and TREC-DL benchmarks; however, intent-preserving misspellings can induce planning collapse, while query translation substantially recovers cross-lingual performance.

Technology Category

Application Category

πŸ“ Abstract
Generative retrieval (GR) ranks documents by autoregressively generating document identifiers. Because many GR methods rely on trie-constrained beam search, they are vulnerable to early pruning of relevant prefixes under finite-beam decoding. Planning Ahead in Generative Retrieval (PAG) mitigates this failure mode by using simultaneous decoding to compute a document-level look-ahead prior that guides subsequent sequential decoding. We reproduce PAG at inference time and stress-test its decoding behavior. Using the authors' released checkpoint and identifier/trie artifacts under the reported decoding setup, we reproduce the main effectiveness results on MS MARCO Dev and TREC-DL 2019/2020, and corroborate the reported beam-size-latency trade-off in our hardware setting. Beyond reproduction, we introduce plan drift diagnostics that quantify how intent-preserving query variations alter the planner's top-n candidate set and highest-weight planner tokens, and how these changes affect guided decoding. We find that PAG's planning signal is brittle under lexical surface-form variation: intent-preserving typos can trigger plan collapse, where the planned candidate pool shifts enough that the look-ahead bonus provides little useful guidance, effectively reverting decoding toward weaker unguided search. We further evaluate fixed-index cross-lingual robustness using non-English mMARCO queries against an English index, and assess query-side mitigation strategies that require no re-indexing; query translation provides the strongest recovery in our setting. Overall, our results confirm PAG's reported effectiveness and the benefit of planning-guided decoding under the released inference setup, while showing that these gains depend on the stability of the planning signal under realistic query variation and query-document mismatch.
Problem

Research questions and friction points this paper is trying to address.

generative retrieval
look-ahead prior
decoding robustness
query variation
cross-lingual retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

plan drift diagnostics
generative retrieval
look-ahead prior
decoding robustness
query variation
πŸ”Ž Similar Papers
No similar papers found.