Spatio-Temporal State Space Model For Efficient Event-Based Optical Flow

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the high computational cost of deep learning models (CNNs/RNNs/ViTs) and the weak spatiotemporal modeling capability of asynchronous models (SNNs/GNNs) in event-camera optical flow estimation, this paper proposes STSSM—the first lightweight spatiotemporal module integrating State Space Models (SSMs). STSSM processes event streams via an event-driven architecture synergized with an efficient SSM kernel to capture long-range spatiotemporal dependencies, preserving the low-latency advantage of asynchronous processing while significantly enhancing representational capacity. Evaluated on the DSEC benchmark, our method achieves accuracy comparable to EV-FlowNet, with a 4.5× speedup in inference latency and an 8× reduction in computational cost compared to TMA. STSSM thus establishes a new state-of-the-art trade-off between accuracy and efficiency for event-based optical flow estimation.

Technology Category

Application Category

📝 Abstract

Event cameras unlock new frontiers that were previously unthinkable with standard frame-based cameras. One notable example is low-latency motion estimation (optical flow), which is critical for many real-time applications. In such applications, the computational efficiency of algorithms is paramount. Although recent deep learning paradigms such as CNN, RNN, or ViT have shown remarkable performance, they often lack the desired computational efficiency. Conversely, asynchronous event-based methods including SNNs and GNNs are computationally efficient; however, these approaches fail to capture sufficient spatio-temporal information, a powerful feature required to achieve better performance for optical flow estimation. In this work, we introduce Spatio-Temporal State Space Model (STSSM) module along with a novel network architecture to develop an extremely efficient solution with competitive performance. Our STSSM module leverages state-space models to effectively capture spatio-temporal correlations in event data, offering higher performance with lower complexity compared to ViT, CNN-based architectures in similar settings. Our model achieves 4.5x faster inference and 8x lower computations compared to TMA and 2x lower computations compared to EV-FlowNet with competitive performance on the DSEC benchmark. Our code will be available at https://github.com/AhmedHumais/E-STMFlow

Problem

Research questions and friction points this paper is trying to address.

Improving computational efficiency in event-based optical flow

Capturing sufficient spatio-temporal information in motion estimation

Balancing performance and complexity in deep learning models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatio-Temporal State Space Model (STSSM)

Efficient event-based optical flow

Lower complexity than ViT and CNN

🔎 Similar Papers

No similar papers found.

Authors to Follow