🤖 AI Summary
Existing event-based vision methods struggle to simultaneously achieve high sparsity and competitive performance; spiking neural networks (SNNs) exhibit insufficient accuracy on complex tasks such as object detection and optical flow estimation, and fail to attain high activation sparsity. To address this, we propose a context-aware sparse spatiotemporal learning framework featuring a novel dynamic thresholding mechanism—neuron firing thresholds are adaptively adjusted based on local spatiotemporal event context, enabling >95% neuron sparsity without explicit sparsity regularization. The method preserves model compactness while substantially improving accuracy, achieving state-of-the-art performance on object detection and optical flow estimation benchmarks including N-Caltech101 and DSEC. Furthermore, it delivers 2.3× inference speedup and 68% energy reduction on edge devices, advancing the deployment of efficient, brain-inspired visual understanding systems.
📝 Abstract
Event-based camera has emerged as a promising paradigm for robot perception, offering advantages with high temporal resolution, high dynamic range, and robustness to motion blur. However, existing deep learning-based event processing methods often fail to fully leverage the sparse nature of event data, complicating their integration into resource-constrained edge applications. While neuromorphic computing provides an energy-efficient alternative, spiking neural networks struggle to match of performance of state-of-the-art models in complex event-based vision tasks, like object detection and optical flow. Moreover, achieving high activation sparsity in neural networks is still difficult and often demands careful manual tuning of sparsity-inducing loss terms. Here, we propose Context-aware Sparse Spatiotemporal Learning (CSSL), a novel framework that introduces context-aware thresholding to dynamically regulate neuron activations based on the input distribution, naturally reducing activation density without explicit sparsity constraints. Applied to event-based object detection and optical flow estimation, CSSL achieves comparable or superior performance to state-of-the-art methods while maintaining extremely high neuronal sparsity. Our experimental results highlight CSSL's crucial role in enabling efficient event-based vision for neuromorphic processing.