π€ AI Summary
This work addresses the low energy efficiency and high latency of exact k-nearest neighbor (k-NN) search in high-dimensional latent spaces on FPGAs. We propose two large-scale, green retrieval schemes tailored for neural encoder representations, sharing a unified FPGA hardware architecture. One scheme prioritizes throughput, the other latencyβboth integrating batch-level parallelism and in-memory query parallelism to jointly support streaming and resident data scenarios. Experiments demonstrate that, while maintaining 100% recall, our methods achieve up to 16.6Γ higher throughput and optimal latency compared to state-of-the-art CPU implementations, along with 11.9Γ lower energy consumption. The core contribution is the first FPGA-based k-NN hardware acceleration framework that simultaneously guarantees lossless accuracy, adaptive deployment across diverse retrieval scenarios, and co-optimized energy efficiency and real-time performance.
π Abstract
This paper investigates the usage of FPGA devices for energy-efficient exact kNN search in high-dimension latent spaces. This work intercepts a relevant trend that tries to support the increasing popularity of learned representations based on neural encoder models by making their large-scale adoption greener and more inclusive. The paper proposes two different energy-efficient solutions adopting the same FPGA low-level configuration. The first solution maximizes system throughput by processing the queries of a batch in parallel over a streamed dataset not fitting into the FPGA memory. The second minimizes latency by processing each kNN incoming query in parallel over an in-memory dataset. Reproducible experiments on publicly available image and text datasets show that our solution outperforms state-of-the-art CPU-based competitors regarding throughput, latency, and energy consumption. Specifically, experiments show that the proposed FPGA solutions achieve the best throughput in terms of queries per second and the best-observed latency with scale-up factors of up to 16.6X. Similar considerations can be made regarding energy efficiency, where results show that our solutions can achieve up to 11.9X energy saving w.r.t. strong CPU-based competitors.