MPAD: A New Dimension-Reduction Method for Preserving Nearest Neighbors in High-Dimensional Vector Search

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

In high-dimensional vector retrieval, conventional unsupervised dimensionality reduction methods (e.g., PCA, UMAP) often degrade retrieval accuracy because their optimization objectives—unrelated to retrieval—fail to preserve neighborhood structures. To address this, we propose MPAD, the first unsupervised dimensionality reduction method explicitly designed for retrieval: it maximizes the pairwise absolute distance difference between k-nearest neighbors and non-neighbors under a soft orthogonality constraint, thereby directly optimizing the discriminative boundary for nearest-neighbor identification—without labels or fine-tuning. MPAD formulates an end-to-end framework integrating a distance-sensitive loss with a geometry-preserving mechanism. Extensive experiments across diverse benchmark datasets show that MPAD achieves 12–28% higher neighbor preservation rates than PCA and UMAP after dimensionality reduction, and its retrieval accuracy closely approaches that in the original high-dimensional space—effectively balancing precision and computational efficiency.

Technology Category

Application Category

📝 Abstract

High-dimensional vector embeddings are widely used in retrieval systems, yet dimensionality reduction (DR) is seldom applied due to its tendency to distort nearest-neighbor (NN) structure critical for search. Existing DR techniques such as PCA and UMAP optimize global or manifold-preserving criteria, rather than retrieval-specific objectives. We present MPAD: Maximum Pairwise Absolute Difference, an unsupervised DR method that explicitly preserves approximate NN relations by maximizing the margin between k-NNs and non-k-NNs under a soft orthogonality constraint. This design enables MPAD to retain ANN-relevant geometry without supervision or changes to the original embedding model. Experiments across multiple domains show that MPAD consistently outperforms standard DR methods in preserving neighborhood structure, enabling more accurate search in reduced dimensions.

Problem

Research questions and friction points this paper is trying to address.

Preserves nearest-neighbor relations in high-dimensional search

Addresses distortion from standard dimensionality reduction methods

Optimizes retrieval-specific objectives without supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised DR method preserving NN relations

Maximizes margin between k-NNs and non-k-NNs

Soft orthogonality constraint maintains ANN geometry

🔎 Similar Papers

HUMAP: Hierarchical Uniform Manifold Approximation and Projection