On Strengths and Limitations of Single-Vector Embeddings

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This study investigates the pronounced performance degradation of single-vector embeddings in retrieval tasks, whose reliability has come under scrutiny. Leveraging the LIMIT and MSMARCO datasets, the authors conduct a systematic analysis through fine-tuning, controlled experiments, and mathematical modeling. They find that dimensional constraints are not the primary cause; rather, domain shift and a misalignment between embedding similarity and task-specific relevance are central to the issue. Single-vector models are shown to be particularly susceptible to the “document drowning” effect and exhibit heightened sensitivity to corpus size expansion. Although fine-tuning improves recall, it triggers catastrophic forgetting, whereas multi-vector models consistently outperform their single-vector counterparts. This work is the first to identify relevance misalignment and domain shift as fundamental limitations underlying the shortcomings of single-vector embeddings.

Technology Category

Application Category

📝 Abstract

Recent work (Weller et al., 2025) introduced a naturalistic dataset called LIMIT and showed empirically that a wide range of popular single-vector embedding models suffer substantial drops in retrieval quality, raising concerns about the reliability of single-vector embeddings for retrieval. Although (Weller et al., 2025) proposed limited dimensionality as the main factor contributing to this, we show that dimensionality alone cannot explain the observed failures. We observe from results in (Alon et al., 2016) that $2k+1$-dimensional vector embeddings suffice for top-$k$ retrieval. This result points to other drivers of poor performance. Controlling for tokenization artifacts and linguistic similarity between attributes yields only modest gains. In contrast, we find that domain shift and misalignment between embedding similarities and the task's underlying notion of relevance are major contributors; finetuning mitigates these effects and can improve recall substantially. Even with finetuning, however, single-vector models remain markedly weaker than multi-vector representations, pointing to fundamental limitations. Moreover, finetuning single-vector models on LIMIT-like datasets leads to catastrophic forgetting (performance on MSMARCO drops by more than 40%), whereas forgetting for multi-vector models is minimal. To better understand the gap between performance of single-vector and multi-vector models, we study the drowning in documents paradox (Reimers \& Gurevych, 2021; Jacob et al., 2025): as the corpus grows, relevant documents are increasingly "drowned out" because embedding similarities behave, in part, like noisy statistical proxies for relevance. Through experiments and mathematical calculations on toy mathematical models, we illustrate why single-vector models are more susceptible to drowning effects compared to multi-vector models.

Problem

Research questions and friction points this paper is trying to address.

single-vector embeddings

retrieval quality

domain shift

catastrophic forgetting

drowning in documents

Innovation

Methods, ideas, or system contributions that make the work stand out.

single-vector embeddings

domain shift

catastrophic forgetting