Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs

📅 2025-05-12
📈 Citations: 0
âœĻ Influential: 0
📄 PDF
ðŸĪ– AI Summary
To address the trade-off between high-precision perception and temporal fidelity in sparse, asynchronous event camera streams, this paper proposes a time-discretization-free, per-pixel encoding scheme that jointly represents event timestamps and polarities, integrated with a self-supervised GRU network to yield low-latency, high-fidelity event representations. We introduce the first temporal self-supervised encoding mechanism, eliminating time distortions inherent in hand-crafted event aggregation; and achieve the first end-to-end hardware deployment of an event-driven recurrent representation on a System-on-Chip (SoC) FPGA. Evaluated on the Gen1 and 1-Mpx datasets, our method improves mean Average Precision (mAP) by 2.4% and 0.6%, respectively. FPGA measurements demonstrate inference latency below 1 ξs and power consumption of only 1–2 W, satisfying stringent real-time and ultra-low-power requirements for edge perception.

Technology Category

Application Category

📝 Abstract
Event cameras offer significant advantages over traditional frame-based sensors. These include microsecond temporal resolution, robustness under varying lighting conditions and low power consumption. Nevertheless, the effective processing of their sparse, asynchronous event streams remains challenging. Existing approaches to this problem can be categorised into two distinct groups. The first group involves the direct processing of event data with neural models, such as Spiking Neural Networks or Graph Convolutional Neural Networks. However, this approach is often accompanied by a compromise in terms of qualitative performance. The second group involves the conversion of events into dense representations with handcrafted aggregation functions, which can boost accuracy at the cost of temporal fidelity. This paper introduces a novel Self-Supervised Event Representation (SSER) method leveraging Gated Recurrent Unit (GRU) networks to achieve precise per-pixel encoding of event timestamps and polarities without temporal discretisation. The recurrent layers are trained in a self-supervised manner to maximise the fidelity of event-time encoding. The inference is performed with event representations generated asynchronously, thus ensuring compatibility with high-throughput sensors. The experimental validation demonstrates that SSER outperforms aggregation-based baselines, achieving improvements of 2.4% mAP and 0.6% on the Gen1 and 1 Mpx object detection datasets. Furthermore, the paper presents the first hardware implementation of recurrent representation for event data on a System-on-Chip FPGA, achieving sub-microsecond latency and power consumption between 1-2 W, suitable for real-time, power-efficient applications. Code is available at https://github.com/vision-agh/RecRepEvent.
Problem

Research questions and friction points this paper is trying to address.

Processing sparse asynchronous event streams effectively
Balancing accuracy and temporal fidelity in event data
Achieving real-time low-power event processing on SoC FPGAs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised GRU networks for event encoding
Asynchronous event representation without temporal discretisation
Hardware implementation on SoC FPGA for low latency
🔎 Similar Papers
No similar papers found.
Kamil Jeziorek
Kamil Jeziorek
AGH University of Krakow
Event CamerasGraph Neural NetworksObject DetectionComputer VisionHardware Acceleration
T
Tomasz Kryjak
Embedded Vision Systems Group, Computer Vision Laboratory, Department of Automatic Control and Robotics, AGH University of Krakow