🤖 AI Summary
Existing event-processing solutions for edge robotics suffer from end-to-end pipeline fragmentation, high latency, and suboptimal exploitation of event sparsity. To address these issues, this work proposes an ultra-low-latency, end-to-end edge AI platform tailored for event cameras. The platform integrates the Prophesee IMX636 event sensor with a Xilinx Zynq UltraScale+ MPSoC FPGA, featuring a custom hardware-optimized preprocessing pipeline—supporting multi-mode histogram accumulation and time-surface computation—and a lightweight AI accelerator. This co-design ensures high accuracy while drastically reducing inference latency and hardware resource consumption. Evaluated on the DVS Gesture dataset, the platform achieves 94% classification accuracy; in low-latency mode, it sustains 1000 fps throughput while utilizing only 33% of FPGA LUTs and exhibiting minimal memory footprint. Its modular architecture ensures strong scalability, making it well-suited for high-speed closed-loop control applications, such as real-time gesture interaction.
📝 Abstract
Event cameras offer significant advantages for edge robotics applications due to their asynchronous operation and sparse, event-driven output, making them well-suited for tasks requiring fast and efficient closed-loop control, such as gesture-based human-robot interaction. Despite this potential, existing event processing solutions remain limited, often lacking complete end-to-end implementations, exhibiting high latency, and insufficiently exploiting event data sparsity. In this paper, we present HOMI, an ultra-low latency, end-to-end edge AI platform comprising a Prophesee IMX636 event sensor chip with an Xilinx Zynq UltraScale+MPSoC FPGA chip, deploying an in-house developed AI accelerator. We have developed hardware-optimized pre-processing pipelines supporting both constant-time and constant-event modes for histogram accumulation, linear and exponential time surfaces. Our general-purpose implementation caters to both accuracy-driven and low-latency applications. HOMI achieves 94% accuracy on the DVS Gesture dataset as a use case when configured for high accuracy operation and provides a throughput of 1000 fps for low-latency configuration. The hardware-optimised pipeline maintains a compact memory footprint and utilises only 33% of the available LUT resources on the FPGA, leaving ample headroom for further latency reduction, model parallelisation, multi-task deployments, or integration of more complex architectures.