🤖 AI Summary
This work addresses the high cost and catastrophic forgetting associated with retraining generative retrieval models when expanding the corpus. To enable dynamic, training-free corpus extension, the authors propose ICICLE, a framework that reframes incremental generative retrieval as an in-context retrieval problem. By injecting newly added documents along with source-aware identifiers during inference, ICICLE jointly leverages parametric memory and contextual documents without requiring retraining. The framework introduces a [COPY] routing mechanism, a preference-based calibration strategy, and large-context adaptation techniques to effectively distinguish and integrate knowledge from both sources. Experiments on MS MARCO and NQ320K demonstrate that ICICLE significantly improves retrieval performance on new documents while maintaining high recall on previously seen ones, all without corpus-specific retraining.
📝 Abstract
Generative retrieval (GR) maps queries directly to document identifiers (docids) using parametric knowledge, However, this design makes corpus expansion costly: adding new documents requires updating model parameters to encode new document-docid associations incurs repeated training and catastrophic forgetting of previously indexed documents. In this work, we revisit incremental GR as an in-context retrieval problem, where newly added documents are supplied as inference-time document-docid evidence. We propose ICICLE, an in-context indexing framework that performs source-aware docid generation over both parametric memory and context-provided document-docid pairs. ICICLE combines a `[COPY]`-based routing mechanism, preference-based calibration, and large context adaptation to distinguish context-grounded retrieval from parametric retrieval. Experiments on MS MARCO and NQ320K show that ICICLE improves retrieval of newly introduced documents while preserving seen-document retention without corpus-specific retraining. Our analysis further shows that high-shot degradation is mainly caused by routing failure, highlighting source-selection calibration as a key bottleneck for scaling in-context generative retrieval.