GeAR: Generation Augmented Retrieval

📅 2025-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing document retrieval methods suffer from two key limitations: scalar similarity scores lack the expressiveness for fine-grained semantic understanding, and overreliance on global representations neglects local semantic alignment between queries and documents. To address these issues, we propose GeAR (Generative-enhanced Retrieval), a novel framework that integrates a lightweight decoder into the dual-encoder retrieval paradigm—enabling fine-grained query-document alignment and interpretable natural language generation without increasing inference latency. GeAR jointly leverages cross-modal representations and employs large language models to synthesize high-quality training data. Evaluated across multiple benchmark datasets, GeAR achieves state-of-the-art performance in both retrieval accuracy and passage localization, while additionally generating human-readable supporting evidence. The framework thus advances the trade-off among efficiency, effectiveness, and interpretability in neural retrieval.

Technology Category

Application Category

📝 Abstract
Document retrieval techniques form the foundation for the development of large-scale information systems. The prevailing methodology is to construct a bi-encoder and compute the semantic similarity. However, such scalar similarity is difficult to reflect enough information and impedes our comprehension of the retrieval results. In addition, this computational process mainly emphasizes the global semantics and ignores the fine-grained semantic relationship between the query and the complex text in the document. In this paper, we propose a new method called $ extbf{Ge}$neration $ extbf{A}$ugmented $ extbf{R}$etrieval ($ extbf{GeAR}$) that incorporates well-designed fusion and decoding modules. This enables GeAR to generate the relevant text from documents based on the fused representation of the query and the document, thus learning to"focus on"the fine-grained information. Also when used as a retriever, GeAR does not add any computational burden over bi-encoders. To support the training of the new framework, we have introduced a pipeline to efficiently synthesize high-quality data by utilizing large language models. GeAR exhibits competitive retrieval and localization performance across diverse scenarios and datasets. Moreover, the qualitative analysis and the results generated by GeAR provide novel insights into the interpretation of retrieval results. The code, data, and models will be released after completing technical review to facilitate future research.
Problem

Research questions and friction points this paper is trying to address.

Semantic Similarity
Information Retrieval
Query-Document Relationship
Innovation

Methods, ideas, or system contributions that make the work stand out.

GeAR
Generation Augmented Retrieval
Fused Representation
🔎 Similar Papers
No similar papers found.