A Representation Sharpening Framework for Zero Shot Dense Retrieval

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Zero-shot dense retrieval suffers from semantic ambiguity among top-ranked documents due to the absence of relevance labels for training queries. To address this, we propose a training-free representation sharpening framework that enhances document embeddings via context-aware representation refinement during indexing—improving semantic discriminability without modifying the underlying retriever. Our method is agnostic to any pre-trained dense retriever and incorporates an approximation strategy to balance effectiveness and computational overhead. Evaluated across 20+ multilingual zero-shot benchmarks—including the BRIGHT benchmark—our approach achieves new state-of-the-art performance. Its approximate variant retains over 90% of the full method’s gains while incurring zero additional inference cost. The core contribution is the first unsupervised, fine-tuning-free, and computationally efficient document representation sharpening technique, which significantly alleviates semantic confusion in zero-shot dense retrieval.

Technology Category

Application Category

📝 Abstract
Zero-shot dense retrieval is a challenging setting where a document corpus is provided without relevant queries, necessitating a reliance on pretrained dense retrievers (DRs). However, since these DRs are not trained on the target corpus, they struggle to represent semantic differences between similar documents. To address this failing, we introduce a training-free representation sharpening framework that augments a document's representation with information that helps differentiate it from similar documents in the corpus. On over twenty datasets spanning multiple languages, the representation sharpening framework proves consistently superior to traditional retrieval, setting a new state-of-the-art on the BRIGHT benchmark. We show that representation sharpening is compatible with prior approaches to zero-shot dense retrieval and consistently improves their performance. Finally, we address the performance-cost tradeoff presented by our framework and devise an indexing-time approximation that preserves the majority of our performance gains over traditional retrieval, yet suffers no additional inference-time cost.
Problem

Research questions and friction points this paper is trying to address.

Improving zero-shot dense retrieval without training on target corpus
Enhancing document representation to distinguish similar documents
Addressing performance-cost tradeoff in retrieval framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free representation sharpening framework for dense retrieval
Augments document representation to differentiate similar documents
Indexing-time approximation preserves gains with no inference cost
🔎 Similar Papers
No similar papers found.
D
Dhananjay Ashok
Information Sciences Institute, University of Southern California
S
Suraj Nair
Amazon
M
Mutasem Al-Darabsah
Amazon
C
C. Teo
Amazon
T
Tarun Agarwal
Amazon
Jonathan May
Jonathan May
University of Southern California, Information Sciences Institute
Machine TranslationMachine LearningNatural Language Processing