UFO: Unifying Feed-Forward and Optimization-based Methods for Large Driving Scene Modeling

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing feedforward approaches struggle with high computational costs and inadequate modeling of dynamic objects in long-duration driving scene reconstruction. This work proposes UFO, the first unified framework that integrates optimization and feedforward paradigms for recursive 4D reconstruction. UFO introduces an iteratively updatable 4D scene representation, a visibility-aware token filtering mechanism, and object-pose-guided dynamic modeling to significantly enhance both efficiency and accuracy. Evaluated on the Waymo Open Dataset, UFO achieves high-quality reconstruction of 16-second driving logs in under 0.5 seconds, outperforming current methods in both visual fidelity and geometric precision.

Technology Category

Application Category

📝 Abstract
Dynamic driving scene reconstruction is critical for autonomous driving simulation and closed-loop learning. While recent feed-forward methods have shown promise for 3D reconstruction, they struggle with long-range driving sequences due to quadratic complexity in sequence length and challenges in modeling dynamic objects over extended durations. We propose UFO, a novel recurrent paradigm that combines the benefits of optimization-based and feed-forward methods for efficient long-range 4D reconstruction. Our approach maintains a 4D scene representation that is iteratively refined as new observations arrive, using a visibility-based filtering mechanism to select informative scene tokens and enable efficient processing of long sequences. For dynamic objects, we introduce an object pose-guided modeling approach that supports accurate long-range motion capture. Experiments on the Waymo Open Dataset demonstrate that our method significantly outperforms both per-scene optimization and existing feed-forward methods across various sequence lengths. Notably, our approach can reconstruct 16-second driving logs within 0.5 second while maintaining superior visual quality and geometric accuracy.
Problem

Research questions and friction points this paper is trying to address.

dynamic driving scene reconstruction
long-range 4D reconstruction
feed-forward methods
optimization-based methods
dynamic objects modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D scene reconstruction
recurrent paradigm
visibility-based filtering
pose-guided dynamic modeling
long-range driving sequences
🔎 Similar Papers
No similar papers found.
K
Kaiyuan Tan
Xiaomi EV
Y
Yingying Shen
Xiaomi EV
M
Mingfei Tu
Xiaomi EV
H
Haohui Zhu
Xiaomi EV
Bing Wang
Bing Wang
Xiaomi EV
Computer VisionPattern RecognitionMachine Learning
G
Guang Chen
Xiaomi EV
H
Hangjun Ye
Xiaomi EV
Haiyang Sun
Haiyang Sun
Xiaomi EV
World ModelAutonomous Driving3D Vision