NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

252K/year
🤖 AI Summary
This work addresses the memory bandwidth bottleneck in high-dimensional approximate nearest neighbor search (ANNS) on CPUs and GPUs, where conventional early termination mechanisms struggle to accelerate computation due to slow distance convergence. The authors propose a hardware-software co-design that integrates DIMM-level near-data processing (NDP) with a PCA-statistics-based feature-level early stopping mechanism, employing an estimate-and-correct strategy to accurately approximate full-dimensional distances. Additionally, they introduce bit-level dynamic floating-point compression and data-aware neighbor list mapping to substantially reduce memory access and communication overhead. Evaluated under strict accuracy constraints, the proposed system achieves 8.4× and 1.4× speedups over state-of-the-art CPU and GPU baselines, respectively, and outperforms the latest NDP accelerator, ANSMET, by 1.69×.
📝 Abstract
As large language models (LLMs) continue to advance, retrieval-augmented generation (RAG) has become the key mechanism for expanding model knowledge and reducing hallucinations. Central to RAG is approximate nearest neighbor search (ANNS), which retrieves database vectors most similar to a given query. However, distance calculation over high-dimensional vectors is inherently memory-bound, causing retrieval performance to be constrained by I/O bandwidth on mainstream platforms such as CPUs and GPUs. Although many prior early exiting (EE) techniques attempt to reduce memory accesses by only computing partial dimensions, the partial distance converges too slowly to the EE threshold, which ultimately limits their performance gains. To address these challenges, we propose NASZIP, a hardware-software co-designed framework that integrates near data processing (NDP) with a novel feature-level early exiting guided by statistics-based principal component analysis (PCA). Instead of relying solely on partial distances, NASZIP incorporates estimation and correction parameters to approximate full dimensional distances accurately, enabling earlier exiting without compromising accuracy. We further introduce a bit-level NDP-aware dynamic-float scheme that significantly reduces memory access for vector data. On the hardware side, we develop a data aware neighbor list mapping strategy that reduces neighbor retrieval latency and inter-channel communication overhead, complemented by a dedicated cache that exploits data locality and enhances prefetch efficiency. With these co-optimized techniques, NASZIP delivers speedups of up to $8.4\times$ / $1.4\times$ over CPU baseline and state-of-the-art GPU implementation at equal accuracy. Relative to the state-of-the-art NDP ANNS accelerator ANSMET, NASZIP achieves $1.69\times$ performance improvement.
Problem

Research questions and friction points this paper is trying to address.

Approximate Nearest Neighbor Search
Memory-Bound Computation
Early Exiting
Retrieval-Augmented Generation
Near-Data Processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

near-data processing
approximate nearest neighbor search
hardware-software co-design
early exiting
principal component analysis
C
Cheng Zou
Intelligent Computing Research Group, School of Computer Science, Shanghai Jiao Tong University, Shanghai, CN
S
Shuo Yang
School of Integrated Circuits, Shanghai Jiao Tong University, Shanghai, CN; Intelligent Computing Research Group, School of Computer Science, Shanghai Jiao Tong University, Shanghai, CN
C
Chen Nie
Intelligent Computing Research Group, School of Computer Science, Shanghai Jiao Tong University, Shanghai, CN; Shanghai AI Laboratory, Shanghai, CN
Y
Yu Zou
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, CN
Y
Yu He
Lenovo Research, Beijing, CN
C
Chao Jiang
Lenovo Research, Beijing, CN
Limin Xiao
Limin Xiao
FDU
Fiber OpticsOptoelectronics
Weifeng Zhang
Weifeng Zhang
Corp VP & Head of Intelligent Computing Lab at Lenovo Research
AI HW SW Co-DesignComputer ArchitectureHeterogeneous ComputingAI/MLGPU Optimizations
Zhezhi He
Zhezhi He
Associate Professor, Shanghai Jiao Tong University
Intelligent ComputingNeuromorphic ComputingComputer ArchitectureEDA