Maximizing Asynchronicity in Event-based Neural Networks

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Event cameras’ asynchronous, sparse, and high-temporal-resolution characteristics pose fundamental limitations for conventional asynchronous-to-synchronous (A2S) representation methods—namely, weak expressivity, poor generalization, and constrained real-time performance. To address these challenges, we propose EVA, an end-to-end asynchronous representation learning framework that introduces linear attention mechanisms and self-supervised language modeling into event-based learning for the first time. EVA employs a streaming encoder operating at the event level, enabling direct “event → vector” mapping without explicit synchronization preprocessing. This design simultaneously achieves high representational capacity, strong generalization, and low inference latency. On DVS128-Gesture and N-Cars classification benchmarks, EVA outperforms existing A2S approaches. Moreover, on the Gen1 object detection task, it achieves 47.7 mAP—marking the first substantive breakthrough of the A2S paradigm in a challenging, high-difficulty detection scenario.

Technology Category

Application Category

📝 Abstract
Event cameras deliver visual data with high temporal resolution, low latency, and minimal redundancy, yet their asynchronous, sparse sequential nature challenges standard tensor-based machine learning (ML). While the recent asynchronous-to-synchronous (A2S) paradigm aims to bridge this gap by asynchronously encoding events into learned representations for ML pipelines, existing A2S approaches often sacrifice representation expressivity and generalizability compared to dense, synchronous methods. This paper introduces EVA (EVent Asynchronous representation learning), a novel A2S framework to generate highly expressive and generalizable event-by-event representations. Inspired by the analogy between events and language, EVA uniquely adapts advances from language modeling in linear attention and self-supervised learning for its construction. In demonstration, EVA outperforms prior A2S methods on recognition tasks (DVS128-Gesture and N-Cars), and represents the first A2S framework to successfully master demanding detection tasks, achieving a remarkable 47.7 mAP on the Gen1 dataset. These results underscore EVA's transformative potential for advancing real-time event-based vision applications.
Problem

Research questions and friction points this paper is trying to address.

Bridging asynchronous event data with standard ML methods
Improving expressivity in event representation learning
Enhancing generalizability for event-based vision tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

EVA framework for event-by-event representation learning
Adapts language modeling techniques for event processing
Uses linear attention and self-supervised learning
🔎 Similar Papers
No similar papers found.
Haiqing Hao
Haiqing Hao
Tsinghua University
N
Nikola Zubic
Robotics and Perception Group, University of Zurich
W
Weihua He
Department of Precision Instrument, Tsinghua University
Z
Zhipeng Sui
Department of Precision Instrument, Tsinghua University
Davide Scaramuzza
Davide Scaramuzza
Professor of Robotics and Perception, University of Zurich
RoboticsRobot VisionMicro Air VehiclesSLAMRobot Learning
W
Wenhui Wang
Department of Precision Instrument, Tsinghua University