KBest: Efficient Vector Search on Kunpeng CPU

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The Kunpeng 920 CPU—a high-performance ARM-based processor—lacks efficient vector search libraries, hindering its adoption for large-scale similarity search tasks. Method: We design and implement the first deep-optimized, high-performance vector retrieval system tailored for the Kunpeng 920 platform. Our approach integrates hardware-aware SIMD acceleration, data prefetching, index structure reorganization, early termination, and vector quantization to fully exploit the chip’s many-core architecture and high memory bandwidth. Contribution/Results: Compared to mainstream x86-based solutions (e.g., Faiss, DiskANN), our system achieves over 2× higher query throughput and supports tens of millions of queries per day. It has been deployed in multiple internal and external production services, demonstrating—for the first time—the competitiveness and engineering viability of ARM-based servers in large-scale vector search workloads.

Technology Category

Application Category

📝 Abstract
Vector search, which returns the vectors most similar to a given query vector from a large vector dataset, underlies many important applications such as search, recommendation, and LLMs. To be economic, vector search needs to be efficient to reduce the resources required by a given query workload. However, existing vector search libraries (e.g., Faiss and DiskANN) are optimized for x86 CPU architectures (i.e., Intel and AMD CPUs) while Huawei Kunpeng CPUs are based on the ARM architecture and competitive in compute power. In this paper, we present KBest as a vector search library tailored for the latest Kunpeng 920 CPUs. To be efficient, KBest incorporates extensive hardware-aware and algorithmic optimizations, which include single-instruction-multiple-data (SIMD) accelerated distance computation, data prefetch, index refinement, early termination, and vector quantization. Experiment results show that KBest outperforms SOTA vector search libraries running on x86 CPUs, and our optimizations can improve the query throughput by over 2x. Currently, KBest serves applications from both our internal business and external enterprise clients with tens of millions of queries on a daily basis.
Problem

Research questions and friction points this paper is trying to address.

Optimize vector search for ARM-based Kunpeng CPUs
Enhance efficiency via hardware-aware algorithmic improvements
Outperform x86-optimized libraries like Faiss and DiskANN
Innovation

Methods, ideas, or system contributions that make the work stand out.

SIMD-accelerated distance computation on ARM
Hardware-aware index refinement techniques
Vector quantization for efficient storage
🔎 Similar Papers
No similar papers found.
Kaihao Ma
Kaihao Ma
Huawei Technologies Ltd.
M
Meiling Wang
Huawei Technologies Ltd.
S
Senkevich Oleg
Huawei Technologies Ltd.
Z
Zijian Li
Huawei Technologies Ltd.
D
Daihao Xue
Huawei Technologies Ltd.
D
Dmitriy Malyshev
Higher School of Economics
Y
Yangming Lv
Huawei Technologies Ltd.
S
Shihai Xiao
Huawei Technologies Ltd.
X
Xiao Yan
Wuhan University
R
Radionov Alexander
Huawei Technologies Ltd.
W
Weidi Zeng
Huawei Technologies Ltd.
Y
Yuanzhan Gao
Huawei Technologies Ltd.
Z
Zhiyu Zou
Huawei Technologies Ltd.
X
Xin Yao
Huawei Technologies Ltd.
L
Lin Liu
Huawei Technologies Ltd.
Junhao Wu
Junhao Wu
Towson university
Computer VisionCryo emMedical image
Yiding Liu
Yiding Liu
TikTok
Y
Yaoyao Fu
Huawei Technologies Ltd.
G
Gongyi Wang
Huawei Technologies Ltd.
G
Gong Zhang
Huawei Technologies Ltd.
F
Fei Yi
Huawei Technologies Ltd.
Yingfan Liu
Yingfan Liu
Xidian University
Vector DatabaseHigh-performance Computations