Spatio-Temporal State Space Model For Efficient Event-Based Optical Flow

πŸ“… 2025-06-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the high computational cost of deep learning models (CNNs/RNNs/ViTs) and the weak spatiotemporal modeling capability of asynchronous models (SNNs/GNNs) in event-camera optical flow estimation, this paper proposes STSSMβ€”the first lightweight spatiotemporal module integrating State Space Models (SSMs). STSSM processes event streams via an event-driven architecture synergized with an efficient SSM kernel to capture long-range spatiotemporal dependencies, preserving the low-latency advantage of asynchronous processing while significantly enhancing representational capacity. Evaluated on the DSEC benchmark, our method achieves accuracy comparable to EV-FlowNet, with a 4.5Γ— speedup in inference latency and an 8Γ— reduction in computational cost compared to TMA. STSSM thus establishes a new state-of-the-art trade-off between accuracy and efficiency for event-based optical flow estimation.

Technology Category

Application Category

πŸ“ Abstract
Event cameras unlock new frontiers that were previously unthinkable with standard frame-based cameras. One notable example is low-latency motion estimation (optical flow), which is critical for many real-time applications. In such applications, the computational efficiency of algorithms is paramount. Although recent deep learning paradigms such as CNN, RNN, or ViT have shown remarkable performance, they often lack the desired computational efficiency. Conversely, asynchronous event-based methods including SNNs and GNNs are computationally efficient; however, these approaches fail to capture sufficient spatio-temporal information, a powerful feature required to achieve better performance for optical flow estimation. In this work, we introduce Spatio-Temporal State Space Model (STSSM) module along with a novel network architecture to develop an extremely efficient solution with competitive performance. Our STSSM module leverages state-space models to effectively capture spatio-temporal correlations in event data, offering higher performance with lower complexity compared to ViT, CNN-based architectures in similar settings. Our model achieves 4.5x faster inference and 8x lower computations compared to TMA and 2x lower computations compared to EV-FlowNet with competitive performance on the DSEC benchmark. Our code will be available at https://github.com/AhmedHumais/E-STMFlow
Problem

Research questions and friction points this paper is trying to address.

Improving computational efficiency in event-based optical flow
Capturing sufficient spatio-temporal information in motion estimation
Balancing performance and complexity in deep learning models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatio-Temporal State Space Model (STSSM)
Efficient event-based optical flow
Lower complexity than ViT and CNN
πŸ”Ž Similar Papers
No similar papers found.
M
M. Humais
Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi, UAE
X
Xiaoqian Huang
Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi, UAE
H
Hussain Sajwani
Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi, UAE
Sajid Javed
Sajid Javed
Assistant Professor, Khalifa University of Science and Technology, UAE
Computer VisionComputational Pathology
Y
Yahya H. Zweiri
Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi, UAE