๐ค AI Summary
This work addresses the scarcity of high-quality LiDAR scene flow annotations in real-world datasets, which severely limits the generalization of 3D motion perception models. To overcome this challenge, the authors propose a motion-centric synthetic data learning paradigm and introduce SynFlow, a dedicated generation pipeline that yields the SynFlow-4k datasetโcomprising 4,000 sequences (approximately 940k frames), 34 times larger than existing real-world benchmarks. Departing from conventional emphasis on photorealistic sensor fidelity, the approach prioritizes modeling motion priors, demonstrating strong domain invariance and zero-shot transferability. Models trained solely on this synthetic data match supervised performance on nuScenes and surpass state-of-the-art methods by 31.8% on TruckScenes; with fine-tuning on merely 5% of real labels, they outperform models trained on full real datasets.
๐ Abstract
Reliable 3D dynamic perception requires models that can anticipate motion beyond predefined categories, yet progress is hindered by the scarcity of dense, high-quality motion annotations. While self-supervision on unlabeled real data offers a path forward, empirical evidence suggests that scaling unlabeled data fails to close the performance gap due to noisy proxy signals. In this paper, we propose a shift in paradigm: learning robust real-world motion priors entirely from scalable simulation. We introduce SynFlow, a data generation pipeline that generates large-scale synthetic dataset specifically designed for LiDAR scene flow. Unlike prior works that prioritize sensor-specific realism, SynFlow employs a motion-oriented strategy to synthesize diverse kinematic patterns across 4,000 sequences ($\sim$940k frames), termed SynFlow-4k. This represents a 34x scale-up in annotated volume over existing real-world benchmarks. Our experiments demonstrate that SynFlow-4k provides a highly domain-invariant motion prior. In a zero-shot regime, models trained exclusively on our synthetic data generalize across multiple real-world benchmarks, rivaling in-domain supervised baselines on nuScenes and outperforming state-of-the-art methods on TruckScenes by 31.8%. Furthermore, SynFlow-4k serves as a label-efficient foundation: fine-tuning with only 5% of real-world labels surpasses models trained from scratch on the full available budget. We open-source the pipeline and dataset to facilitate research in generalizable 3D motion estimation. More detail can be found at https://kin-zhang.github.io/SynFlow.