Low-latency Event-based Object Detection with Spatially-Sparse Linear Attention

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Existing asynchronous event-based object detection methods struggle to balance training efficiency over long sequences with per-event inference latency, and global state updates often lead to a trade-off between accuracy and efficiency. This work proposes Spatially Sparse Linear Attention (SSLA), which introduces state-level sparsity into the linear attention mechanism for the first time. By integrating hybrid spatial state decomposition with a “scatter-compute-gather” training pipeline, SSLA enables sparse activation while preserving training parallelism, yielding an end-to-end asynchronous detector named SSLADet. The method achieves state-of-the-art asynchronous results with mAP scores of 0.375 on Gen1 and 0.515 on N-Caltech101, while reducing per-event computational cost by over 20× compared to the strongest baseline.

Technology Category

Application Category

📝 Abstract

Event cameras provide sequential visual data with spatial sparsity and high temporal resolution, making them attractive for low-latency object detection. Existing asynchronous event-based neural networks realize this low-latency advantage by updating predictions event-by-event, but still suffer from two bottlenecks: recurrent architectures are difficult to train efficiently on long sequences, and improving accuracy often increases per-event computation and latency. Linear attention is appealing in this setting because it supports parallel training and recurrent inference. However, standard linear attention updates a global state for every event, yielding a poor accuracy-efficiency trade-off, which is problematic for object detection, where fine-grained representations and thus states are preferred. The key challenge is therefore to introduce sparse state activation that exploits event sparsity while preserving efficient parallel training. We propose Spatially-Sparse Linear Attention (SSLA), which introduces a mixture-of-spaces state decomposition and a scatter-compute-gather training procedure, enabling state-level sparsity as well as training parallelism. Built on SSLA, we develop an end-to-end asynchronous linear attention model, SSLA-Det, for event-based object detection. On Gen1 and N-Caltech101, SSLA-Det achieves state-of-the-art accuracy among asynchronous methods, reaching 0.375 mAP and 0.515 mAP, respectively, while reducing per-event computation by more than 20 times compared to the strongest prior asynchronous baseline, demonstrating the potential of linear attention for low-latency event-based vision.

Problem

Research questions and friction points this paper is trying to address.

event-based object detection

low-latency

linear attention

spatial sparsity

asynchronous neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatially-Sparse Linear Attention

event-based object detection

asynchronous neural networks