DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMs

📅 2024-10-21
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address I/O bottlenecks in CPU-based approximate nearest neighbor search (ANNS) and GPU memory capacity limitations, this work pioneers the use of commercial DRAM-processing-in-memory (PIM) hardware—specifically UPMEM’s architecture—for ANNS acceleration. Leveraging DRAM-PIM’s high memory bandwidth and large capacity but limited computational capability, we propose a lookup-table (LUT)-driven computation–I/O ratio alignment technique that replaces expensive multiply operations with efficient table lookups. We further design a two-level load-balancing strategy integrating static data layout with dynamic request scheduling, and implement local memory-aware parallel execution. Experiments across mainstream datasets demonstrate an average speedup of 2.92× over a 32-thread CPU baseline. Our approach significantly alleviates both I/O bottlenecks and GPU memory constraints, establishing an efficient PIM paradigm for I/O-intensive vector retrieval.

Technology Category

Application Category

📝 Abstract
Approximate Nearest Neighbor Search (ANNS), which enables efficient semantic similarity search in large datasets, has become a fundamental component of critical applications such as information retrieval and retrieval-augmented generation (RAG). However, ANNS is a well-known I/O-intensive algorithm with a low compute-to-I/O ratio, often requiring massive storage due to the large volume of high-dimensional data. This leads to I/O bottlenecks on CPUs and memory limitations on GPUs. DRAM-based Processing-in-Memory (DRAM-PIM) architecture, which offers high bandwidth, large-capacity memory, and the ability to perform efficient computation in or near the data, presents a promising solution for ANNS. In this work, we investigate the use of commercial DRAM-PIM for ANNS for the first time and propose DRIM-ANN, an optimized ANNS engine based on DRAM-PIMs from UPMEM. Notably, given that the target DRAM-PIM exhibits an even lower compute-to-I/O ratio than basic ANNS, we leverage lookup tables (LUTs) to replace more multiplications with I/O operations. We then systematically tune ANNS to search optimized configurations with lower computational load, aligning the compute-to-I/O ratio of ANNS with that of DRAM-PIMs while maintaining accuracy constraints. Building on this tuned ANNS algorithm, we further explore implementation optimizations to fully utilize the two thousand parallel processing units with private local memory in DRAM-PIMs. To address the load imbalance caused by ANNS requests distributed across different clusters of large datasets, we propose a load-balancing strategy that combines static data layout optimization with dynamic runtime request scheduling. Experimental results on representative datasets show that DRIM-ANN achieves an average performance speedup of 2.92x compared to a 32-thread CPU counterpart.
Problem

Research questions and friction points this paper is trying to address.

Optimizing ANNS for DRAM-PIM platforms with limited compute
Addressing I/O bottlenecks and memory constraints in nearest neighbor search
Enhancing parallel processing efficiency on UPMEM's DRAM-PIM architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages UPMEM DRAM-PIM for ANNS acceleration
Replaces squaring operations with lookup tables
Implements load-balancing and I/O optimization strategies
🔎 Similar Papers
No similar papers found.
M
Mingkai Chen
Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences
T
Tianhua Han
Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences
C
Cheng Liu
Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences
Shengwen Liang
Shengwen Liang
Institute of computing technology, Chinese Academy of Sciences
AcceleratorCognitive SSDSystem
K
Kuai Yu
Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences
L
Lei Dai
Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences
Z
Ziming Yuan
Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences
Y
Ying Wang
Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences
L
Lei Zhang
Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences
Huawei Li
Huawei Li
Institute of Computing Technology, Chinese Academy of Sciences
computer engineering
X
Xiaowei Li
Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences