VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In self-supervised LiDAR scene flow estimation, the absence of local rigid motion priors leads to inconsistent motion predictions. To address this, we propose an end-to-end differentiable pillar-level rigidity constraint mechanism. Our core innovation is a novel differentiable discrete voting module that operates per pillar: it aggregates features and performs consensus voting for rigid motion consistency, thereby directly embedding local rigidity inductive bias into the network architecture—without relying on post-processing or auxiliary regularization. The method integrates seamlessly with standard self-supervised losses and exhibits plug-and-play compatibility. Evaluated on Argoverse 2 and Waymo Open Dataset, our approach significantly outperforms existing self-supervised baselines while introducing negligible computational overhead. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Scene flow estimation aims to recover per-point motion from two adjacent LiDAR scans. However, in real-world applications such as autonomous driving, points rarely move independently of others, especially for nearby points belonging to the same object, which often share the same motion. Incorporating this locally rigid motion constraint has been a key challenge in self-supervised scene flow estimation, which is often addressed by post-processing or appending extra regularization. While these approaches are able to improve the rigidity of predicted flows, they lack an architectural inductive bias for local rigidity within the model structure, leading to suboptimal learning efficiency and inferior performance. In contrast, we enforce local rigidity with a lightweight add-on module in neural network design, enabling end-to-end learning. We design a discretized voting space that accommodates all possible translations and then identify the one shared by nearby points by differentiable voting. Additionally, to ensure computational efficiency, we operate on pillars rather than points and learn representative features for voting per pillar. We plug the Voting Module into popular model designs and evaluate its benefit on Argoverse 2 and Waymo datasets. We outperform baseline works with only marginal compute overhead. Code is available at https://github.com/tudelft-iv/VoteFlow.
Problem

Research questions and friction points this paper is trying to address.

Enforcing local rigidity in self-supervised scene flow estimation
Improving learning efficiency with architectural inductive bias
Ensuring computational efficiency via pillar-based voting module
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight add-on module enforces local rigidity
Differentiable voting identifies shared translations
Pillar-based operation ensures computational efficiency
🔎 Similar Papers
No similar papers found.