π€ AI Summary
To address the low-latency requirements of Graph Vector Search (GVS) in large language models, search engines, and recommendation systems, this work proposes a hardware-algorithm co-design framework. First, we introduce Delayed-Synchronized Traversal (DST), a novel graph traversal algorithm that relaxes strict temporal ordering constraints to better exploit hardware parallelism. Second, we design Falcon, a domain-specific FPGA accelerator featuring on-chip Bloom filters and memory-access compression to drastically reduce memory traffic overhead. Evaluated across diverse real-world graph datasets, Falcon achieves up to 4.3Γ and 19.5Γ end-to-end latency reduction over CPU- and GPU-based baselines, respectively, while delivering 8.0Γ and 26.9Γ improvements in energy efficiency. The framework demonstrates strong performance, broad generalizability across graph structures and workloads, and tight hardware-software co-adaptation.
π Abstract
Vector search systems are indispensable in large language model (LLM) serving, search engines, and recommender systems, where minimizing online search latency is essential. Among various algorithms, graph-based vector search (GVS) is particularly popular due to its high search performance and quality. To efficiently serve low-latency GVS, we propose a hardware-algorithm co-design solution including Falcon, a GVS accelerator, and Delayed-Synchronization Traversal (DST), an accelerator-optimized graph traversal algorithm. Falcon implements high-performance GVS operators and reduces memory accesses with an on-chip Bloom filter to track search states. DST improves search performance and quality by relaxing the graph traversal order to maximize accelerator utilization. Evaluation across various graphs and datasets shows that our Falcon prototype on FPGAs, coupled with DST, achieves up to 4.3$ imes$ and 19.5$ imes$ speedups in latency and up to 8.0$ imes$ and 26.9$ imes$ improvements in energy efficiency over CPU and GPU-based GVS systems. The remarkable efficiency of Falcon and DST demonstrates their potential to become the standard solutions for future GVS acceleration.