Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval

📅 2025-12-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-dimensional vector retrieval suffers from “distance concentration,” causing unstable nearest-neighbor (NN) results: minor query perturbations trigger drastic changes in the NN set. This paper proposes a **stability-centric unified analytical framework**, the first to systematically address three mainstream retrieval paradigms: (1) multi-vector retrieval—deriving fidelity conditions for stability under Chamfer distance; (2) filtered retrieval—revealing how large penalty mechanisms inherently promote stability; and (3) sparse vector retrieval—establishing a novel sufficient stability criterion under structured sparsity constraints. Through rigorous theoretical analysis, principled modeling, and extensive validation on both synthetic and real-world datasets, our theoretical predictions align closely with empirical observations. The results yield verifiable, actionable stability guarantees for vector index design, embedding training, and retrieval-augmented generation (RAG) systems.

Technology Category

Application Category

📝 Abstract
Modern vector databases enable efficient retrieval over high-dimensional neural embeddings, powering applications from web search to retrieval-augmented generation. However, classical theory predicts such tasks should suffer from the curse of dimensionality, where distances between points become nearly indistinguishable, thereby crippling efficient nearest-neighbor search. We revisit this paradox through the lens of stability, the property that small perturbations to a query do not radically alter its nearest neighbors. Building on foundational results, we extend stability theory to three key retrieval settings widely used in practice: (i) multi-vector search, where we prove that the popular Chamfer distance metric preserves single-vector stability, while average pooling aggregation may destroy it; (ii) filtered vector search, where we show that sufficiently large penalties for mismatched filters can induce stability even when the underlying search is unstable; and (iii) sparse vector search, where we formalize and prove novel sufficient stability conditions. Across synthetic and real datasets, our experimental results match our theoretical predictions, offering concrete guidance for model and system design to avoid the curse of dimensionality.
Problem

Research questions and friction points this paper is trying to address.

Analyzes stability of vector retrieval in high-dimensional spaces
Extends stability theory to multi-vector, filtered, and sparse search
Provides guidance to avoid the curse of dimensionality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends stability theory to multi-vector search with Chamfer distance
Shows filtered search gains stability via large filter penalties
Formalizes stability conditions for sparse vector search
🔎 Similar Papers
No similar papers found.