More Than Efficiency: Embedding Compression Improves Domain Adaptation in Dense Retrieval

๐Ÿ“… 2026-01-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the performance degradation of dense retrieval models under domain shift, a challenge exacerbated by the high cost of labeled data and retraining in conventional domain adaptation approaches. The authors propose a lightweight, annotation-free adaptation method that requires neither fine-tuning nor additional supervision. By applying Principal Component Analysis (PCA) to compress query embeddings, the method preserves domain-discriminative features while filtering out non-essential components. Evaluated across nine retrievers and fourteen MTEB datasets, this approach improves NDCG@10 in 75.4% of modelโ€“dataset combinations through query embedding compression alone. The technique simultaneously enhances computational efficiency and cross-domain retrieval effectiveness, offering a novel paradigm for efficient domain adaptation in dense retrieval systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Dense retrievers powered by pretrained embeddings are widely used for document retrieval but struggle in specialized domains due to the mismatches between the training and target domain distributions. Domain adaptation typically requires costly annotation and retraining of query-document pairs. In this work, we revisit an overlooked alternative: applying PCA to domain embeddings to derive lower-dimensional representations that preserve domain-relevant features while discarding non-discriminative components. Though traditionally used for efficiency, we demonstrate that this simple embedding compression can effectively improve retrieval performance. Evaluated across 9 retrievers and 14 MTEB datasets, PCA applied solely to query embeddings improves NDCG@10 in 75.4% of model-dataset pairs, offering a simple and lightweight method for domain adaptation.
Problem

Research questions and friction points this paper is trying to address.

domain adaptation
dense retrieval
embedding mismatch
specialized domains
retrieval performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

embedding compression
domain adaptation
dense retrieval
PCA
query embedding
๐Ÿ”Ž Similar Papers
No similar papers found.