🤖 AI Summary
Heterogeneous CPU architectures in cloud environments lack systematic, cross-vendor cost-performance evaluations for vector search, hindering optimal hardware selection. Method: We introduce the first comprehensive benchmarking framework for vector search across multi-generation CPUs—AMD (Zen 4), Intel (Sapphire Rapids), and AWS Graviton (v3/v4)—built upon Faiss and Annoy. It rigorously evaluates IVF and HNSW index structures under float32 and int8 quantization, measuring queries-per-second (QPS) and cost-efficiency (queries-per-dollar, QP$). Contribution/Results: Our analysis uncovers asymmetric microarchitectural impacts: Zen 4 achieves up to 3× higher QPS than Sapphire Rapids on IVF, yet underperforms on HNSW; Graviton3 attains the highest QP$ across most configurations—even surpassing Graviton4. These findings provide empirically grounded, reproducible guidance for hardware selection in cloud-based vector databases.
📝 Abstract
Vector databases have emerged as a new type of systems that support efficient querying of high-dimensional vectors. Many of these offer their database as a service in the cloud. However, the variety of available CPUs and the lack of vector search benchmarks across CPUs make it difficult for users to choose one. In this study, we show that CPU microarchitectures available in the cloud perform significantly differently across vector search scenarios. For instance, in an IVF index on float32 vectors, AMD's Zen4 gives almost 3x more queries per second (QPS) compared to Intel's Sapphire Rapids, but for HNSW indexes, the tables turn. However, when looking at the number of queries per dollar (QP$), Graviton3 is the best option for most indexes and quantization settings, even over Graviton4 (Table 1). With this work, we hope to guide users in getting the best"bang for the buck"when deploying vector search systems.