Statistical Foundations of DIME: Risk Estimation for Practical Index Selection

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of high-dimensional dense embeddings, which often contain substantial noise or redundant dimensions. Existing dimensionality reduction methods (DIME) rely on costly grid search to select a fixed number of dimensions uniformly across all queries, lacking query-specific adaptability. To overcome this limitation, the paper introduces—based on statistical risk theory—a novel query-aware dynamic dimension selection mechanism. During inference, the method estimates statistical risk per query to independently assess the importance of each dimension and prune redundant ones, eliminating the need for a globally predetermined dimensionality. Evaluated across multiple models and datasets, the approach reduces embedding dimensions by approximately 50% on average while preserving retrieval performance, thereby significantly enhancing both computational efficiency and query-level adaptivity.

Technology Category

Application Category

📝 Abstract
High-dimensional dense embeddings have become central to modern Information Retrieval, but many dimensions are noisy or redundant. Recently proposed DIME (Dimension IMportance Estimation), provides query-dependent scores to identify informative components of embeddings. DIME relies on a costly grid search to select a priori a dimensionality for all the query corpus's embeddings. Our work provides a statistically grounded criterion that directly identifies the optimal set of dimensions for each query at inference time. Experiments confirm achieving parity of effectiveness and reduces embedding size by an average of $\sim50\%$ across different models and datasets at inference time.
Problem

Research questions and friction points this paper is trying to address.

high-dimensional embeddings
dimension selection
information retrieval
noise reduction
query-dependent optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

DIME
dimension selection
risk estimation
query-dependent embedding
statistical criterion
🔎 Similar Papers
No similar papers found.