๐ค AI Summary
Existing generative retrieval methods model only the mapping between queries and document IDs, neglecting semantic relevance between queries and document contentโleading to limited representational capacity. To address this, we propose DOGR, a document-oriented generative retrieval framework featuring a novel two-stage contrastive learning mechanism: (i) explicitly injecting document content into query encoding, and (ii) optimizing query-document joint representations via dynamic negative sampling and adaptive contrastive loss. DOGR transcends conventional ID-level modeling without modifying the architecture of generative language models and remains compatible with diverse document ID construction schemes. Evaluated on two major public benchmarks, DOGR consistently outperforms state-of-the-art methods, achieving substantial gains in both retrieval accuracy and generalization capability.
๐ Abstract
Generative retrieval constitutes an innovative approach in in- formation retrieval, leveraging generative language models (LM) to generate a ranked list of document identifiers (do- cid) for a given query. It simplifies the retrieval pipeline by replacing the large external index with model parameters. However, existing works merely learned the relationship be- tween queries and document identifiers, which is unable to directly represent the relevance between queries and docu- ments. To address the above problem, we propose a novel and general generative retrieval framework, namely Leverag- ing Document-Oriented Contrastive Learning in Generative Retrieval (DOGR), which leverages contrastive learning to improve generative retrieval tasks. It adopts a two-stage learn- ing strategy that captures the relationship between queries and documents comprehensively through direct interactions. Furthermore, negative sampling methods and correspond- ing contrastive learning objectives are implemented to en- hance the learning of semantic representations, thereby pro- moting a thorough comprehension of the relationship be- tween queries and documents. Experimental results demon- strate that DOGR achieves state-of-the-art performance com- pared to existing generative retrieval methods on two public benchmark datasets. Further experiments have shown that our framework is generally effective for common identifier con- struction techniques.