🤖 AI Summary
This work addresses the instability of 3D line reconstruction from event cameras during motion, which is often exacerbated by sparse line features and noise, and typically relies on auxiliary sensors in existing approaches. We propose the first purely event-driven framework for 3D line reconstruction, leveraging multi-temporal event data to extract robust line trajectories. A novel geometric cost function is introduced to jointly optimize the 3D line map and camera poses, effectively mitigating projection distortion and depth ambiguity. Operating without external sensors, our method demonstrates strong robustness against noise and appearance variations and supports multimodal observation fusion. Extensive experiments across multiple datasets show significant improvements in both mapping accuracy and pose estimation, validating the efficacy and flexibility of line-based representations for real-world deployment.
📝 Abstract
Event cameras in motion tend to detect object boundaries or texture edges, which produce lines of brightness changes, especially in man-made environments. While lines can constitute a robust intermediate representation that is consistently observed, the sparse nature of lines may lead to drastic deterioration with minor estimation errors. Only a few previous works, often accompanied by additional sensors, utilize lines to compensate for the severe domain discrepancies of event sensors along with unpredictable noise characteristics. We propose a method that can stably extract tracks of varying appearances of lines using a clever algorithmic process that observes multiple representations from various time slices of events, compensating for potential adversaries within the event data. We then propose geometric cost functions that can refine the 3D line maps and camera poses, eliminating projective distortions and depth ambiguities. The 3D line maps are highly compact and can be equipped with our proposed cost function, which can be adapted for any observations that can detect and extract line structures or projections of them, including 3D point cloud maps or image observations. We demonstrate that our formulation is powerful enough to exhibit a significant performance boost in event-based mapping and pose refinement across diverse datasets, and can be flexibly applied to multimodal scenarios. Our results confirm that the proposed line-based formulation is a robust and effective approach for the practical deployment of event-based perceptual modules. Project page: https://gwangtak.github.io/roel/