Learning to Select: Query-Aware Adaptive Dimension Selection for Dense Retrieval

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses the redundancy inherent in high-dimensional dense retrieval embeddings at the query level, where existing fixed or heuristic dimension selection strategies struggle to align with the varying information needs of different queries. To overcome this limitation, the authors propose a query-aware adaptive dimension selection framework that, for the first time, formulates dimension importance as a learnable task. By leveraging supervised signals to construct a dimension importance distribution, they train a lightweight predictor to directly estimate per-dimension weights from the query embedding, enabling customized dimension pruning without pseudo-relevance feedback. Integrating supervised distillation, a query-driven selector, and a dense retrieval model, the approach significantly outperforms full-dimension baselines, PRF-based masking, and supervised adapter methods across multiple benchmarks, yielding consistent gains in retrieval effectiveness.

Technology Category

Application Category

📝 Abstract

Dense retrieval represents queries and documents as high-dimensional embeddings, but these representations can be redundant at the query level: for a given information need, only a subset of dimensions is consistently helpful for ranking. Prior work addresses this via pseudo-relevance feedback (PRF) based dimension importance estimation, which can produce query-aware masks without labeled data but often relies on noisy pseudo signals and heuristic test-time procedures. In contrast, supervised adapter methods leverage relevance labels to improve embedding quality, yet they learn global transformations shared across queries and do not explicitly model query-aware dimension importance. We propose a Query-Aware Adaptive Dimension Selection framework that \emph{learns} to predict per-dimension importance directly from query embedding. We first construct oracle dimension importance distributions over embedding dimensions using supervised relevance labels, and then train a predictor to map a query embedding to these label-distilled importance scores. At inference, the predictor selects a query-aware subset of dimensions for similarity computation based solely on the query embedding, without pseudo-relevance feedback. Experiments across multiple dense retrievers and benchmarks show that our learned dimension selector improves retrieval effectiveness over the full-dimensional baseline as well as PRF-based masking and supervised adapter baselines.

Problem

Research questions and friction points this paper is trying to address.

dense retrieval

dimension selection

query-aware

embedding redundancy

relevance ranking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Query-Aware Dimension Selection

Dense Retrieval

Embedding Adaptation