Model Editing for New Document Integration in Generative Information Retrieval

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge that generative retrieval models struggle to generalize to new document identifiers (docIDs) upon the introduction of unseen documents, while incremental retraining is computationally expensive and prone to catastrophic forgetting. To overcome this, the authors propose DOME, the first model-editing framework tailored for docID generation. DOME identifies critical model layers, constructs discriminative editing vectors, and employs a hybrid label-adaptive training strategy—leveraging soft labels to preserve query semantics and hard labels to ensure precise docID mapping—to enable targeted and efficient parameter updates. Evaluated on the NQ and MS MARCO benchmarks, DOME significantly improves retrieval performance for newly added documents while maintaining original retrieval effectiveness, achieving these gains with only 60% of the training time required by conventional incremental training, thereby substantially reducing computational overhead.

Technology Category

Application Category

📝 Abstract

Generative retrieval (GR) reformulates the Information Retrieval (IR) task as the generation of document identifiers (docIDs). Despite its promise, existing GR models exhibit poor generalization to newly added documents, often failing to generate the correct docIDs. While incremental training offers a straightforward remedy, it is computationally expensive, resource-intensive, and prone to catastrophic forgetting, thereby limiting the scalability and practicality of GR. In this paper, we identify the core bottleneck as the decoder's ability to map hidden states to the correct docIDs of newly added documents. Model editing, which enables targeted parameter modifications for docID mapping, represents a promising solution. However, applying model editing to current GR models is not trivial, which is severely hindered by indistinguishable edit vectors across queries, due to the high overlap of shared docIDs in retrieval results. To address this, we propose DOME (docID-oriented model editing), a novel method that effectively and efficiently adapts GR models to unseen documents. DOME comprises three stages: (1) identification of critical layers, (2) optimization of edit vectors, and (3) construction and application of updates. At its core, DOME employs a hybrid-label adaptive training strategy that learns discriminative edit vectors by combining soft labels, which preserve query-specific semantics for distinguishable updates, with hard labels that enforce precise mapping modifications. Experiments on widely used benchmarks, including NQ and MS MARCO, show that our method significantly improves retrieval performance on new documents while maintaining effectiveness on the original collection. Moreover, DOME achieves this with only about 60% of the training time required by incremental training, considerably reducing computational cost and enabling efficient, frequent model updates.

Problem

Research questions and friction points this paper is trying to address.

generative retrieval

model editing

document integration

docID generation

catastrophic forgetting

Innovation

Methods, ideas, or system contributions that make the work stand out.

model editing

generative retrieval

docID mapping