Continual Learning for Generative Retrieval over Dynamic Corpora

📅 2023-08-29

🏛️ International Conference on Information and Knowledge Management

📈 Citations: 43

✨ Influential: 4

career value

163K/year

🤖 AI Summary

This work addresses the continual learning problem of generative retrieval (GR) over dynamically evolving document collections—specifically, how to efficiently incrementally index newly arriving documents while preserving robust retrieval performance over both historical and incoming documents. To this end, we propose CLEVER, a novel framework featuring (i) an adaptive dual-threshold incremental product quantization mechanism that enables lightweight, on-the-fly codebook updates, and (ii) a memory-augmented learning module that explicitly models semantic relationships between old and new documents to mitigate catastrophic forgetting. Evaluated across multiple incremental retrieval benchmarks, CLEVER achieves substantial improvements in retrieval accuracy and inference efficiency, reduces forgetting rates by over 40%, and simultaneously ensures high indexing efficiency, strong knowledge retention, and superior generalization capability.

📝 Abstract

Generative retrieval (GR) directly predicts the identifiers of relevant documents (i.e., docids) based on a parametric model. It has achieved solid performance on many ad-hoc retrieval tasks. So far, these tasks have assumed a static document collection. In many practical scenarios, however, document collections are dynamic, where new documents are continuously added to the corpus. The ability to incrementally index new documents while preserving the ability to answer queries with both previously and newly indexed relevant documents is vital to applying GR models. In this paper, we address this practical continual learning problem for GR. We put forward a novel Continual-LEarner for generatiVE Retrieval (CLEVER) model and make two major contributions to continual learning for GR: (i) To encode new documents into docids with low computational cost, we present Incremental Product Quantization, which updates a partial quantization codebook according to two adaptive thresholds; and (ii) To memorize new documents for querying without forgetting previous knowledge, we propose a memory-augmented learning mechanism, to form meaningful connections between old and new documents. Empirical results demonstrate the effectiveness and efficiency of the proposed model.

Problem

Research questions and friction points this paper is trying to address.

Addressing continual learning for generative retrieval in dynamic document collections

Developing incremental indexing methods to encode new documents efficiently

Preventing catastrophic forgetting while integrating new knowledge into retrieval models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Incremental Product Quantization for low-cost docid encoding

Memory-augmented learning prevents forgetting previous knowledge

Adaptive thresholds update partial quantization codebook efficiently

🔎 Similar Papers

No similar papers found.