🤖 AI Summary
There is a lack of systematic empirical characterization of distributed vector database performance on HPC platforms—particularly in scientific computing scenarios.
Method: This paper presents the first large-scale empirical evaluation of Qdrant on the Polaris supercomputer, using Qwen3-Embedding-4B–generated embeddings and the BV-BRC biomedical text workload to quantitatively measure insertion latency, index construction time, and query latency at scale (32 nodes).
Contribution/Results: The study identifies critical bottlenecks—including network communication overhead, I/O scheduling inefficiencies, and GPU-CPU coordination constraints—and proposes HPC-aware configuration optimizations for vector databases. It delivers the first publicly available, reproducible benchmark for distributed vector databases in AI-native scientific computing environments, providing actionable insights and performance baselines for designing cross-architecture vector retrieval systems.
📝 Abstract
Vector databases have rapidly grown in popularity, enabling efficient similarity search over data such as text, images, and video. They now play a central role in modern AI workflows, aiding large language models by grounding model outputs in external literature through retrieval-augmented generation. Despite their importance, little is known about the performance characteristics of vector databases in high-performance computing (HPC) systems that drive large-scale science. This work presents an empirical study of distributed vector database performance on the Polaris supercomputer in the Argonne Leadership Computing Facility. We construct a realistic biological-text workload from BV-BRC and generate embeddings from the peS2o corpus using Qwen3-Embedding-4B. We select Qdrant to evaluate insertion, index construction, and query latency with up to 32 workers. Informed by practical lessons from our experience, this work takes a first step toward characterizing vector database performance on HPC platforms to guide future research and optimization.