๐ค AI Summary
To address perception gaps and motion blur caused by fixed frame rates in conventional LiDAR/RGB cameras under high-speed dynamic scenarios, this paper proposes the first continuous-time 3D object detection framework based on stereo event cameras. Methodologically, we design a dual-branch filtering network to jointly extract semantic and geometric features from asynchronous event streams, and introduce a spatiotemporal alignment mechanism along with a center-aligned regression optimization strategy, enabling end-to-end, purely event-driven 3D detection. Our key contributions are: (i) the first demonstration of continuous-time modeling and 3D localization using only binocular event dataโbypassing frame-rate limitations entirely; and (ii) a novel dual-filtering mechanism and center-aligned regression that significantly enhance detection accuracy and robustness for fast-moving objects. Experiments on multiple high-speed dynamic sequences show substantial improvements over state-of-the-art event-based and frame-based 3D detection methods.
๐ Abstract
3D object detection is essential for autonomous systems, enabling precise localization and dimension estimation. While LiDAR and RGB cameras are widely used, their fixed frame rates create perception gaps in high-speed scenarios. Event cameras, with their asynchronous nature and high temporal resolution, offer a solution by capturing motion continuously. The recent approach, which integrates event cameras with conventional sensors for continuous-time detection, struggles in fast-motion scenarios due to its dependency on synchronized sensors. We propose a novel stereo 3D object detection framework that relies solely on event cameras, eliminating the need for conventional 3D sensors. To compensate for the lack of semantic and geometric information in event data, we introduce a dual filter mechanism that extracts both. Additionally, we enhance regression by aligning bounding boxes with object-centric information. Experiments show that our method outperforms prior approaches in dynamic environments, demonstrating the potential of event cameras for robust, continuous-time 3D perception. The code is available at https://github.com/mickeykang16/Ev-Stereo3D.