DOGR: Leveraging Document-Oriented Contrastive Learning in Generative Retrieval

๐Ÿ“… 2025-02-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing generative retrieval methods model only the mapping between queries and document IDs, neglecting semantic relevance between queries and document contentโ€”leading to limited representational capacity. To address this, we propose DOGR, a document-oriented generative retrieval framework featuring a novel two-stage contrastive learning mechanism: (i) explicitly injecting document content into query encoding, and (ii) optimizing query-document joint representations via dynamic negative sampling and adaptive contrastive loss. DOGR transcends conventional ID-level modeling without modifying the architecture of generative language models and remains compatible with diverse document ID construction schemes. Evaluated on two major public benchmarks, DOGR consistently outperforms state-of-the-art methods, achieving substantial gains in both retrieval accuracy and generalization capability.

Technology Category

Application Category

๐Ÿ“ Abstract
Generative retrieval constitutes an innovative approach in in- formation retrieval, leveraging generative language models (LM) to generate a ranked list of document identifiers (do- cid) for a given query. It simplifies the retrieval pipeline by replacing the large external index with model parameters. However, existing works merely learned the relationship be- tween queries and document identifiers, which is unable to directly represent the relevance between queries and docu- ments. To address the above problem, we propose a novel and general generative retrieval framework, namely Leverag- ing Document-Oriented Contrastive Learning in Generative Retrieval (DOGR), which leverages contrastive learning to improve generative retrieval tasks. It adopts a two-stage learn- ing strategy that captures the relationship between queries and documents comprehensively through direct interactions. Furthermore, negative sampling methods and correspond- ing contrastive learning objectives are implemented to en- hance the learning of semantic representations, thereby pro- moting a thorough comprehension of the relationship be- tween queries and documents. Experimental results demon- strate that DOGR achieves state-of-the-art performance com- pared to existing generative retrieval methods on two public benchmark datasets. Further experiments have shown that our framework is generally effective for common identifier con- struction techniques.
Problem

Research questions and friction points this paper is trying to address.

Improving generative retrieval tasks
Enhancing query-document relevance
Implementing contrastive learning objectives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Document-Oriented Contrastive Learning
Two-Stage Learning Strategy
Negative Sampling Methods
๐Ÿ”Ž Similar Papers
No similar papers found.
P
Penghao Lu
Ant Group
X
Xin Dong
Ant Group
Y
Yuansheng Zhou
Ant Group
L
Lei Cheng
Ant Group
C
Chuan Yuan
Ant Group
Linjian Mo
Linjian Mo
Ant Group