LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses the limitations of current dense retrieval methods, which either underutilize the reasoning capabilities of large language models (LLMs) by treating them as static encoders or suffer from high latency due to autoregressive generation in explicit reasoning. To bridge this gap, we propose LaSER, a novel framework that internalizes explicit reasoning paths into the retriever’s latent space through trajectory alignment, enabling efficient implicit reasoning without intermediate text generation. LaSER employs a dual-view training mechanism—explicit and implicit—on a shared LLM backbone, enhanced by multi-granularity alignment and self-distillation to jointly optimize reasoning depth and retrieval efficiency. Extensive experiments demonstrate that LaSER significantly outperforms existing approaches on both in-domain and cross-domain reasoning-intensive retrieval benchmarks, exhibiting strong robustness and effectiveness across varying model scales.

Technology Category

Application Category

📝 Abstract

LLMs have fundamentally transformed dense retrieval, upgrading backbones from discriminative encoders to generative architectures. However, a critical disconnect remains: while LLMs possess strong reasoning capabilities, current retrievers predominantly utilize them as static encoders, leaving their potential for complex reasoning unexplored. To address this, existing approaches typically adopt rewrite-then-retrieve pipelines to generate explicit CoT rationales before retrieval. However, this incurs prohibitive latency. In this paper, we propose LaSER, a novel self-distillation framework that internalizes explicit reasoning into the latent space of dense retrievers. Operating on a shared LLM backbone, LaSER introduces a dual-view training mechanism: an Explicit view that explicitly encodes ground-truth reasoning paths, and a Latent view that performs implicit latent thinking. To bridge the gap between these views, we design a multi-grained alignment strategy. Beyond standard output alignment, we introduce a trajectory alignment mechanism that synchronizes the intermediate latent states of the latent path with the semantic progression of the explicit reasoning segments. This allows the retriever to think silently and effectively without autoregressive text generation. Extensive experiments on both in-domain and out-of-domain reasoning-intensive benchmarks demonstrate that LaSER significantly outperforms state-of-the-art baselines. Furthermore, analyses across diverse backbones and model scales validate the robustness of our approach, confirming that our unified learning framework is essential for eliciting effective latent thinking. Our method successfully combines the reasoning depth of explicit CoT pipelines with the inference efficiency of standard dense retrievers.

Problem

Research questions and friction points this paper is trying to address.

dense retrieval

large language models

reasoning

chain-of-thought

latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent reasoning

dense retrieval

chain-of-thought