🤖 AI Summary
To address the challenges of high-throughput, ultra-low-latency, and energy-efficient real-time event processing for automotive embedded systems, this paper proposes a hardware-software co-optimization framework targeting SoC FPGAs. We present the first PointNet++ acceleration implementation on the Xilinx ZCU104 platform and introduce an event-aware asynchronous graph convolutional network (EFGCN) capable of online analysis of continuous event streams. Our approach integrates model pruning, quantization, and a customized pipelined accelerator architecture, achieving over 100× model size reduction. Experimental evaluation demonstrates a throughput of 13.3 MEPS and an end-to-end latency of 4.47 ms, with only 2.3% and 1.7% accuracy degradation on N-Caltech101 and N-Cars benchmarks, respectively. This work establishes the first hardware architecture for asynchronous GCNs, and we publicly release the complete software-hardware stack. Our framework provides an efficient, edge-deployable paradigm for event-driven intelligent perception.
📝 Abstract
The utilisation of event cameras represents an important and swiftly evolving trend aimed at addressing the constraints of traditional video systems. Particularly within the automotive domain, these cameras find significant relevance for their integration into embedded real-time systems due to lower latency and energy consumption. One effective approach to ensure the necessary throughput and latency for event processing systems is through the utilisation of graph convolutional networks (GCNs). In this study, we introduce a series of hardware-aware optimisations tailored for PointNet++, a GCN architecture designed for point cloud processing. The proposed techniques result in more than a 100-fold reduction in model size compared to Asynchronous Event-based GNN (AEGNN), one of the most recent works in the field, with a relatively small decrease in accuracy (2.3% for N-Caltech101 classification, 1.7% for N-Cars classification), thus following the TinyML trend. Based on software research, we designed a custom EFGCN (Event-Based FPGA-accelerated Graph Convolutional Network) and we implemented it on ZCU104 SoC FPGA platform, achieving a throughput of 13.3 million events per second (MEPS) and real-time partially asynchronous processing with a latency of 4.47 ms. We also address the scalability of the proposed hardware model to improve the obtained accuracy score. To the best of our knowledge, this study marks the first endeavour in accelerating PointNet++ networks on SoC FPGAs, as well as the first hardware architecture exploration of graph convolutional networks implementation for real-time continuous event data processing. We publish both software and hardware source code in an open repository: https://github.com/vision-agh/*** (will be published upon acceptance).