🤖 AI Summary
This work addresses the trade-off between efficiency and fidelity in existing 4D Gaussian splatting methods for autonomous driving scene reconstruction, where per-scene optimization lacks scalability and feedforward approaches suffer from insufficient photometric accuracy. To overcome these limitations, we propose ReconDrive—a feedforward framework built upon the 3D foundation model VGGT—that achieves efficient, high-fidelity reconstruction by decoupling spatial coordinate and appearance attribute prediction, introducing a mixture-of-Gaussians prediction head, and designing an explicit motion-aware 4D synthesis mechanism that jointly models static and dynamic elements. Experiments on nuScenes demonstrate that ReconDrive significantly outperforms current feedforward methods, achieving reconstruction quality, novel-view synthesis, and 3D perception performance comparable to per-scene optimization while offering orders-of-magnitude faster inference.
📝 Abstract
High-fidelity visual reconstruction and novel-view synthesis are essential for realistic closed-loop evaluation in autonomous driving. While 4D Gaussian Splatting (4DGS) offers a promising balance of accuracy and efficiency, existing per-scene optimization methods require costly iterative refinement, rendering them unscalable for extensive urban environments. Conversely, current feed-forward approaches often suffer from degraded photometric quality. To address these limitations, we propose ReconDrive, a feed-forward framework that leverages and extends the 3D foundation model VGGT for rapid, high-fidelity 4DGS generation. Our architecture introduces two core adaptations to tailor the foundation model to dynamic driving scenes: (1) Hybrid Gaussian Prediction Heads, which decouple the regression of spatial coordinates and appearance attributes to overcome the photometric deficiencies inherent in generalized foundation features; and (2) a Static-Dynamic 4D Composition strategy that explicitly captures temporal motion via velocity modeling to represent complex dynamic environments. Benchmarked on nuScenes, ReconDrive significantly outperforms existing feed-forward baselines in reconstruction, novel-view synthesis, and 3D perception. It achieves performance competitive with per-scene optimization while being orders of magnitude faster, providing a scalable and practical solution for realistic driving simulation.