Exploring Reasoning-Infused Text Embedding with Large Language Models for Zero-Shot Dense Retrieval

📅 2025-08-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based text embedding methods emphasize contextual encoding while neglecting the models’ logical reasoning capabilities, resulting in suboptimal performance on zero-shot dense retrieval tasks requiring complex reasoning. Method: We propose RITE—a novel framework that explicitly integrates the generative large language model’s reasoning ability into the embedding process. RITE enhances raw text semantics by generating intermediate reasoning paths and constructs reasoning-enriched dense vectors directly in token space—without fine-tuning or additional training. Contribution/Results: Evaluated on rigorous reasoning-intensive retrieval benchmarks (e.g., BRIGHT), RITE substantially outperforms state-of-the-art zero-shot embedding methods. It demonstrates strong cross-domain robustness and computational efficiency, establishing a new paradigm for reasoning-augmented information retrieval.

Technology Category

Application Category

📝 Abstract
Transformer-based models such as BERT and E5 have significantly advanced text embedding by capturing rich contextual representations. However, many complex real-world queries require sophisticated reasoning to retrieve relevant documents beyond surface-level lexical matching, where encoder-only retrievers often fall short. Decoder-only large language models (LLMs), known for their strong reasoning capabilities, offer a promising alternative. Despite this potential, existing LLM-based embedding methods primarily focus on contextual representation and do not fully exploit the reasoning strength of LLMs. To bridge this gap, we propose Reasoning-Infused Text Embedding (RITE), a simple but effective approach that integrates logical reasoning into the text embedding process using generative LLMs. RITE builds upon existing language model embedding techniques by generating intermediate reasoning texts in the token space before computing embeddings, thereby enriching representations with inferential depth. Experimental results on BRIGHT, a reasoning-intensive retrieval benchmark, demonstrate that RITE significantly enhances zero-shot retrieval performance across diverse domains, underscoring the effectiveness of incorporating reasoning into the embedding process.
Problem

Research questions and friction points this paper is trying to address.

Enhancing text embeddings with reasoning for retrieval
Bridging reasoning gaps in LLM-based dense retrieval
Improving zero-shot retrieval via reasoning-infused embeddings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates logical reasoning into text embedding
Generates intermediate reasoning texts before computing embeddings
Enhances zero-shot retrieval with inferential depth
🔎 Similar Papers
No similar papers found.
Y
Yuxiang Liu
University of Illinois at Urbana-Champaign
T
Tian Wang
Amazon
Gourab Kundu
Gourab Kundu
Amazon
T
Tianyu Cao
Amazon
G
Guang Cheng
Amazon
Z
Zhen Ge
Amazon
Jianshu Chen
Jianshu Chen
Principal Scientist, Amazon
Large Language ModelsMachine LearningNatural Language Processing
Q
Qingjun Cui
Amazon
Trishul Chilimbi
Trishul Chilimbi
Sr. Principal Scientist, Amazon
Artificial Intelligence