🤖 AI Summary
Neural-symbolic AI suffers from memory-bound execution, heterogeneous computation demands, and irregular memory access patterns on general-purpose hardware (CPU/GPU/TPU), resulting in low resource utilization and high inference latency. This paper proposes the first algorithm-hardware co-design framework tailored for neural-symbolic AI, comprising factorized computation optimization, reconfigurable neural-symbolic processing elements (nsPEs), and a bubble-stream dataflow (BS) architecture. It integrates an ST-mapping–guided spatio-temporal compute structure and an adaptive heterogeneous scheduler (adSCH), implemented as a custom accelerator in 28 nm CMOS. Experimental results demonstrate up to 75× speedup over TPU-like systolic arrays (with <5% area overhead) and 4–96× acceleration over GPUs. The accelerator enables real-time abductive inference at 0.3 seconds per task, with only 1.48 W power consumption and a compact 4 mm² die area.
📝 Abstract
Neurosymbolic AI is an emerging compositional paradigm that fuses neural learning with symbolic reasoning to enhance the transparency, interpretability, and trustworthiness of AI. It also exhibits higher data efficiency making it promising for edge deployments. Despite the algorithmic promises and demonstrations, unfortunately executing neurosymbolic workloads on current hardware (CPU/GPU/TPU) is challenging due to higher memory intensity, greater compute heterogeneity and access pattern irregularity, leading to severe hardware underutilization. This work proposes CogSys, a characterization and co-design framework dedicated to neurosymbolic AI system acceleration, aiming to win both reasoning efficiency and scalability. On the algorithm side, CogSys proposes an efficient factorization technique to alleviate compute and memory overhead. On the hardware side, CogSys proposes a scalable neurosymbolic architecture with reconfigurable neuro/symbolic processing elements (nsPE) and bubble streaming (BS) dataflow with spatial-temporal (ST) mapping for highly parallel and efficient neurosymbolic computation. On the system side, CogSys features an adaptive workload-aware scheduler (adSCH) to orchestrate heterogeneous kernels and enhance resource utilization. Evaluated across cognitive workloads, CogSys enables reconfigurable support for neural and symbolic kernels and exhibits>75x speedup over TPU-like systolic array with only<5% area overhead, as benchmarked under the TSMC 28nm technology node. CogSys achieves 4x-96x speedup compared to desktop and edge GPUs. For the first time, CogSys enables real-time abduction reasoning towards human fluid intelligence, requiring only 0.3 s per reasoning task with 4 mm2 area and 1.48 W power consumption.