🤖 AI Summary
Existing mmWave radar-based scene flow estimation relies heavily on costly LiDAR supervision, while visual-inertial (VI) sensors suffer from insufficient 3D motion perception and inertial drift, rendering them unreliable for supervision. To address this, we propose a LiDAR-free end-to-end learning framework. Our key contributions are: (1) a drift-free rigid-body transformation estimator that jointly leverages visual-mmWave geometric constraints to suppress cumulative errors in VI odometry; and (2) a motion-model-guided neural network that fuses optical and radar signals to collaboratively extract scene flow supervision for both static and dynamic points. Experiments demonstrate that our method outperforms LiDAR-based state-of-the-art approaches under low-visibility conditions (e.g., smoke), significantly enhancing robustness. Moreover, it enables crowd-sourced training using data from intelligent vehicles, eliminating dependence on LiDAR infrastructure.
📝 Abstract
This work proposes a mmWave radar's scene flow estimation framework supervised by data from a widespread visual-inertial (VI) sensor suite, allowing crowdsourced training data from smart vehicles. Current scene flow estimation methods for mmWave radar are typically supervised by dense point clouds from 3D LiDARs, which are expensive and not widely available in smart vehicles. While VI data are more accessible, visual images alone cannot capture the 3D motions of moving objects, making it difficult to supervise their scene flow. Moreover, the temporal drift of VI rigid transformation also degenerates the scene flow estimation of static points. To address these challenges, we propose a drift-free rigid transformation estimator that fuses kinematic model-based ego-motions with neural network-learned results. It provides strong supervision signals to radar-based rigid transformation and infers the scene flow of static points. Then, we develop an optical-mmWave supervision extraction module that extracts the supervision signals of radar rigid transformation and scene flow. It strengthens the supervision by learning the scene flow of dynamic points with the joint constraints of optical and mmWave radar measurements. Extensive experiments demonstrate that, in smoke-filled environments, our method even outperforms state-of-the-art (SOTA) approaches using costly LiDARs.