DOGR: Leveraging Document-Oriented Contrastive Learning in Generative Retrieval

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing generative retrieval methods model only the mapping between queries and document IDs, neglecting semantic relevance between queries and document content—leading to limited representational capacity. To address this, we propose DOGR, a document-oriented generative retrieval framework featuring a novel two-stage contrastive learning mechanism: (i) explicitly injecting document content into query encoding, and (ii) optimizing query-document joint representations via dynamic negative sampling and adaptive contrastive loss. DOGR transcends conventional ID-level modeling without modifying the architecture of generative language models and remains compatible with diverse document ID construction schemes. Evaluated on two major public benchmarks, DOGR consistently outperforms state-of-the-art methods, achieving substantial gains in both retrieval accuracy and generalization capability.

Technology Category

Application Category

📝 Abstract

Generative retrieval constitutes an innovative approach in in- formation retrieval, leveraging generative language models (LM) to generate a ranked list of document identifiers (do- cid) for a given query. It simplifies the retrieval pipeline by replacing the large external index with model parameters. However, existing works merely learned the relationship be- tween queries and document identifiers, which is unable to directly represent the relevance between queries and docu- ments. To address the above problem, we propose a novel and general generative retrieval framework, namely Leverag- ing Document-Oriented Contrastive Learning in Generative Retrieval (DOGR), which leverages contrastive learning to improve generative retrieval tasks. It adopts a two-stage learn- ing strategy that captures the relationship between queries and documents comprehensively through direct interactions. Furthermore, negative sampling methods and correspond- ing contrastive learning objectives are implemented to en- hance the learning of semantic representations, thereby pro- moting a thorough comprehension of the relationship be- tween queries and documents. Experimental results demon- strate that DOGR achieves state-of-the-art performance com- pared to existing generative retrieval methods on two public benchmark datasets. Further experiments have shown that our framework is generally effective for common identifier con- struction techniques.

Problem

Research questions and friction points this paper is trying to address.

Improving generative retrieval tasks

Enhancing query-document relevance

Implementing contrastive learning objectives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Document-Oriented Contrastive Learning

Two-Stage Learning Strategy

Negative Sampling Methods

🔎 Similar Papers

Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text