🤖 AI Summary
Existing hardware (CPUs, GPUs, TPUs) struggles to efficiently accelerate neuro-symbolic AI (NSAI) due to its inherent computational heterogeneity, memory intensity, and irregular memory access patterns—characteristics poorly supported by mainstream AI accelerators, which lack adaptability to NSAI’s diverse operations and variable-scale workloads. To address this, we propose the first end-to-end FPGA acceleration framework tailored for NSAI. Our approach features a data-dependency-driven architecture generator that synthesizes a reconfigurable computing array with flexible functional units, reconfigurable memory hierarchies, and mixed-precision support, coupled with dynamic memory topology reconfiguration. Evaluation shows speedups of 31× over Jetson TX2, >2× over GPUs, 8× over TPU-style systolic arrays, and >3× over Xilinx DPU. Notably, when symbolic workload scales by 150×, execution time increases only 4×—enabling, for the first time, real-time acceleration of general-purpose NSAI algorithms.
📝 Abstract
Neuro-Symbolic AI (NSAI) is an emerging paradigm that integrates neural networks with symbolic reasoning to enhance the transparency, reasoning capabilities, and data efficiency of AI systems. Recent NSAI systems have gained traction due to their exceptional performance in reasoning tasks and human-AI collaborative scenarios. Despite these algorithmic advancements, executing NSAI tasks on existing hardware (e.g., CPUs, GPUs, TPUs) remains challenging, due to their heterogeneous computing kernels, high memory intensity, and unique memory access patterns. Moreover, current NSAI algorithms exhibit significant variation in operation types and scales, making them incompatible with existing ML accelerators. These challenges highlight the need for a versatile and flexible acceleration framework tailored to NSAI workloads. In this paper, we propose NSFlow, an FPGA-based acceleration framework designed to achieve high efficiency, scalability, and versatility across NSAI systems. NSFlow features a design architecture generator that identifies workload data dependencies and creates optimized dataflow architectures, as well as a reconfigurable array with flexible compute units, re-organizable memory, and mixed-precision capabilities. Evaluating across NSAI workloads, NSFlow achieves 31x speedup over Jetson TX2, more than 2x over GPU, 8x speedup over TPU-like systolic array, and more than 3x over Xilinx DPU. NSFlow also demonstrates enhanced scalability, with only 4x runtime increase when symbolic workloads scale by 150x. To the best of our knowledge, NSFlow is the first framework to enable real-time generalizable NSAI algorithms acceleration, demonstrating a promising solution for next-generation cognitive systems.