MPAD: A New Dimension-Reduction Method for Preserving Nearest Neighbors in High-Dimensional Vector Search

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In high-dimensional vector retrieval, conventional unsupervised dimensionality reduction methods (e.g., PCA, UMAP) often degrade retrieval accuracy because their optimization objectives—unrelated to retrieval—fail to preserve neighborhood structures. To address this, we propose MPAD, the first unsupervised dimensionality reduction method explicitly designed for retrieval: it maximizes the pairwise absolute distance difference between k-nearest neighbors and non-neighbors under a soft orthogonality constraint, thereby directly optimizing the discriminative boundary for nearest-neighbor identification—without labels or fine-tuning. MPAD formulates an end-to-end framework integrating a distance-sensitive loss with a geometry-preserving mechanism. Extensive experiments across diverse benchmark datasets show that MPAD achieves 12–28% higher neighbor preservation rates than PCA and UMAP after dimensionality reduction, and its retrieval accuracy closely approaches that in the original high-dimensional space—effectively balancing precision and computational efficiency.

Technology Category

Application Category

📝 Abstract
High-dimensional vector embeddings are widely used in retrieval systems, yet dimensionality reduction (DR) is seldom applied due to its tendency to distort nearest-neighbor (NN) structure critical for search. Existing DR techniques such as PCA and UMAP optimize global or manifold-preserving criteria, rather than retrieval-specific objectives. We present MPAD: Maximum Pairwise Absolute Difference, an unsupervised DR method that explicitly preserves approximate NN relations by maximizing the margin between k-NNs and non-k-NNs under a soft orthogonality constraint. This design enables MPAD to retain ANN-relevant geometry without supervision or changes to the original embedding model. Experiments across multiple domains show that MPAD consistently outperforms standard DR methods in preserving neighborhood structure, enabling more accurate search in reduced dimensions.
Problem

Research questions and friction points this paper is trying to address.

Preserves nearest-neighbor relations in high-dimensional search
Addresses distortion from standard dimensionality reduction methods
Optimizes retrieval-specific objectives without supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised DR method preserving NN relations
Maximizes margin between k-NNs and non-k-NNs
Soft orthogonality constraint maintains ANN geometry
🔎 Similar Papers
No similar papers found.