🤖 AI Summary
This work addresses the challenge of aligning sparse, asynchronous events from an event camera with dense LiDAR maps under significant modality discrepancies, particularly in GPS-denied and visually degraded environments. To enhance localization robustness, the authors propose a dual-task framework that jointly learns edge-aware structure and dense event-based depth optical flow. The method explicitly couples edge geometry with depth flow estimation, incorporates modality-invariant geometric constraints, and ensures task consistency through cross-modal fusion and multi-step iterative refinement, yielding a motion representation that is both edge-perceptive and depth-aligned. Pose estimation is subsequently recovered via a PnP solver. Extensive experiments on multiple challenging datasets demonstrate substantial improvements over state-of-the-art approaches. The code, models, and demonstration videos are publicly released.
📝 Abstract
Event cameras offer high-temporal-resolution sensing that remains reliable under high-speed motion and challenging lighting, making them promising for localization from LiDAR point clouds in GPS-denied and visually degraded environments. However, aligning sparse, asynchronous events with dense LiDAR maps is fundamentally ill-posed, as direct correspondence estimation suffers from modality gaps. We propose LEAR, a dual-task learning framework that jointly estimates edge structures and dense event-depth flow fields to bridge the sensing-modality divide. Instead of treating edges as a post-hoc aid, LEAR couples them with flow estimation through a cross-modal fusion mechanism that injects modality-invariant geometric cues into the motion representation, and an iterative refinement strategy that enforces mutual consistency between the two tasks over multiple update steps. This synergy produces edge-aware, depth-aligned flow fields that enable more robust and accurate pose recovery via Perspective-n-Point (PnP) solvers. On several popular and challenging datasets, LEAR achieves superior performance over the best prior method. The source code, trained models, and demo videos are made publicly available online.