🤖 AI Summary
Existing self-supervised LiDAR scene flow methods suffer from noisy pseudo-labels and erroneous motion guidance due to their reliance on sparse geometric observations for distinguishing static and dynamic points, which are highly susceptible to data sparsity and occlusions. To address this limitation, this work proposes TrackCue, a novel framework that, for the first time, integrates dense point tracking from image space into LiDAR scene flow modeling. By leveraging photometric consistency, TrackCue performs motion compensation to disentangle ego-motion from true object motion and subsequently projects the refined motion cues back into the LiDAR domain to enhance the accuracy of static/dynamic point labeling. This approach significantly improves dynamic point classification accuracy and F1 score, yielding substantial performance gains in self-supervised LiDAR scene flow estimation.
📝 Abstract
LiDAR scene flow estimation is essential for autonomous driving, as it provides 3D motion for each point. Self-supervised approaches use static-dynamic classification to mitigate the imbalance between static and dynamic points, deriving targeted supervision. However, existing methods rely on sparse geometric observations for this classification, making them vulnerable to data sparsity and occlusions. The resulting noisy labels provide incorrect motion guidance and degrade scene flow learning. To address this, we introduce TrackCue, a tracking-guided framework for improving dynamic object representation in LiDAR scene flow estimation. In particular, TrackCue repurposes point tracking to obtain dense image-space trajectories anchored to LiDAR points, providing motion cues beyond sparse geometric observations. Furthermore, we present a visually consistent motion compensation strategy that compares the tracked trajectories with ego-induced rigid trajectories in the image plane, effectively isolating true object motion from ego-induced apparent motion. To transfer these isolated motion cues back to the LiDAR domain, we perform visual motion cue lifting, which associates ego-compensated image trajectories with LiDAR points for static-dynamic label refinement. As a result, TrackCue produces more accurate static-dynamic classification and provides more reliable supervision for scene flow learning. Experimental results show that TrackCue significantly improves the precision and F1 score of dynamic labels, leading to performance gains in self-supervised scene flow estimation.