🤖 AI Summary
Existing event-camera keypoint detection methods suffer from motion blur and event noise, leading to poor matching robustness and degraded downstream SLAM performance. To address this, we propose the first cross-modal self-supervised learning framework tailored for event streams: it leverages synchronized grayscale frames to generate sparse pseudo-labels that jointly supervise event-based keypoint detection and descriptor learning. We introduce a spatiotemporally aware event representation and design mechanisms for frame-event alignment and pseudo-label distillation. Notably, our method enables end-to-end replacement of conventional frame-based pipelines in event SLAM for the first time. Experiments demonstrate substantial improvements over state-of-the-art methods in keypoint matching accuracy and robustness under challenging motion and noise conditions, and yield significant gains in localization precision within event-based SLAM. The code and multimedia resources are publicly available.
📝 Abstract
Event-based keypoint detection and matching holds significant potential, enabling the integration of event sensors into highly optimized Visual SLAM systems developed for frame cameras over decades of research. Unfortunately, existing approaches struggle with the motion-dependent appearance of keypoints and the complex noise prevalent in event streams, resulting in severely limited feature matching capabilities and poor performance on downstream tasks. To mitigate this problem, we propose SuperEvent, a data-driven approach to predict stable keypoints with expressive descriptors. Due to the absence of event datasets with ground truth keypoint labels, we leverage existing frame-based keypoint detectors on readily available event-aligned and synchronized gray-scale frames for self-supervision: we generate temporally sparse keypoint pseudo-labels considering that events are a product of both scene appearance and camera motion. Combined with our novel, information-rich event representation, we enable SuperEvent to effectively learn robust keypoint detection and description in event streams. Finally, we demonstrate the usefulness of SuperEvent by its integration into a modern sparse keypoint and descriptor-based SLAM framework originally developed for traditional cameras, surpassing the state-of-the-art in event-based SLAM by a wide margin. Source code and multimedia material are available at smartroboticslab.github.io/SuperEvent.