🤖 AI Summary
This work addresses the inefficiency of conventional processors in handling high-dimensional sparse vector operations in Hyperdimensional Computing (HDC), which are bottlenecked by memory bandwidth and real-time performance constraints. The authors propose a spatially aware image patch encoding method that maps local image patches into hypervectors enriched with spatial information, subsequently fused into a global representation using fundamental HDC primitives—binding, permutation, bundling, and similarity search. To accelerate these operations, an end-to-end FPGA-based pipelined accelerator is designed, exploiting both dimensional and patch-level parallelism. Evaluated on MNIST and Fashion-MNIST, the approach achieves classification accuracies of 95.67% and 85.14%, respectively, with an inference latency of merely 0.09 ms—yielding speedups of up to 1,300× and 60× over CPU and GPU baselines.
📝 Abstract
Hyperdimensional Computing (HDC) represents data using extremely high-dimensional, low-precision vectors, termed hypervectors (HVs), and performs learning and inference through lightweight, noise-tolerant operations. However, the high dimensionality, sparsity, and repeated data movement involved in HDC make these computations difficult to accelerate efficiently on conventional processors. As a result, executing core HDC operations: binding, permutation, bundling, and similarity search: on CPUs or GPUs often leads to suboptimal utilization, memory bottlenecks, and limits on real-time performance. In this paper, our contributions are two-fold. First, we develop an image-encoding algorithm that, similar in spirit to convolutional neural networks, maps local image patches to hypervectors enriched with spatial information. These patch-level hypervectors are then merged into a global representation using the fundamental HDC operations, enabling spatially sensitive and robust image encoding. This encoder achieves 95.67% accuracy on MNIST and 85.14% on Fashion-MNIST, outperforming prior HDC-based image encoders. Second, we design an end-to-end accelerator that implements these compute operations on an FPGA through a pipelined architecture that exploits parallelism both across the hypervector dimensionality and across the set of image patches. Our Alveo U280 implementation delivers 0.09ms inference latency, achieving up to 1300x and 60x speedup over state-of-the-art CPU and GPU baselines, respectively.