CRISP: Correlation-Resilient Indexing via Subspace Partitioning

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of high-dimensional (up to 4,096-dimensional) approximate nearest neighbor search, which suffers from excessive memory consumption, high indexing overhead, and suboptimal query efficiency. The authors propose CRISP, a novel framework that replaces conventional global rotation with a correlation-aware adaptive subspace partitioning strategy and introduces a lightweight variance redistribution mechanism. CRISP further incorporates a cache-friendly compressed sparse row (CSR) index structure and a dual-mode query engine—comprising Guaranteed Mode and Optimized Mode—that integrates rank-based weighted scoring with an early-termination strategy. This design ensures a theoretical recall lower bound while significantly accelerating query processing. Experimental results demonstrate that CRISP achieves state-of-the-art throughput, the lowest indexing overhead, and superior memory efficiency on high-dimensional datasets.

Technology Category

Application Category

📝 Abstract
As the dimensionality of modern learned representations increases to thousands of dimensions, the state-of-the-art Approximate Nearest Neighbor (ANN) indices exhibit severe limitations. Graph-based methods (e.g., HNSW) suffer from prohibitive memory consumption and routing degradation, while recent randomized quantization and learned rotation approaches (e.g., RaBitQ, OPQ) impose significant preprocessing overheads. We introduce CRISP, a novel framework designed for ANN search in very-high-dimensional spaces. Unlike rigid pipelines that apply expensive orthogonal rotations indiscriminately, CRISP employs a lightweight, correlation- aware adaptive strategy that redistributes variance only when necessary, effectively reducing the preprocessing complexity. We couple this adaptive mechanism with a cache-coherent Compressed Sparse Row (CSR) index structure. Furthermore, CRISP incorporates a multi-stage dual-mode query engine: a Guaranteed Mode that preserves rigorous theoretical lower bounds on recall, and an Optimized Mode that leverages rank-based weighted scoring and early termination to reduce query latency. Extensive evaluation on datasets of very high dimensionality (up to 4096) demonstrates that CRISP achieves state-of-the-art query throughput, low construction costs, and peak memory efficiency.
Problem

Research questions and friction points this paper is trying to address.

Approximate Nearest Neighbor
high-dimensional indexing
memory efficiency
preprocessing overhead
correlation resilience
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive variance redistribution
cache-coherent CSR indexing
dual-mode query engine
correlation-aware preprocessing
high-dimensional ANN
🔎 Similar Papers
No similar papers found.
D
Dimitris Dimitropoulos
U. of Ioannina & Archimedes, Athena RC
A
Achilleas Michalopoulos
U. of Ioannina
D
Dimitrios Tsitsigkos
Archimedes, Athena RC
Nikos Mamoulis
Nikos Mamoulis
University of Ioannina
Data ManagementData MiningSpatial Databases