UFO: Unifying Feed-Forward and Optimization-based Methods for Large Driving Scene Modeling

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing feedforward approaches struggle with high computational costs and inadequate modeling of dynamic objects in long-duration driving scene reconstruction. This work proposes UFO, the first unified framework that integrates optimization and feedforward paradigms for recursive 4D reconstruction. UFO introduces an iteratively updatable 4D scene representation, a visibility-aware token filtering mechanism, and object-pose-guided dynamic modeling to significantly enhance both efficiency and accuracy. Evaluated on the Waymo Open Dataset, UFO achieves high-quality reconstruction of 16-second driving logs in under 0.5 seconds, outperforming current methods in both visual fidelity and geometric precision.

Technology Category

Application Category

📝 Abstract

Dynamic driving scene reconstruction is critical for autonomous driving simulation and closed-loop learning. While recent feed-forward methods have shown promise for 3D reconstruction, they struggle with long-range driving sequences due to quadratic complexity in sequence length and challenges in modeling dynamic objects over extended durations. We propose UFO, a novel recurrent paradigm that combines the benefits of optimization-based and feed-forward methods for efficient long-range 4D reconstruction. Our approach maintains a 4D scene representation that is iteratively refined as new observations arrive, using a visibility-based filtering mechanism to select informative scene tokens and enable efficient processing of long sequences. For dynamic objects, we introduce an object pose-guided modeling approach that supports accurate long-range motion capture. Experiments on the Waymo Open Dataset demonstrate that our method significantly outperforms both per-scene optimization and existing feed-forward methods across various sequence lengths. Notably, our approach can reconstruct 16-second driving logs within 0.5 second while maintaining superior visual quality and geometric accuracy.

Problem

Research questions and friction points this paper is trying to address.

dynamic driving scene reconstruction

long-range 4D reconstruction

feed-forward methods

optimization-based methods

dynamic objects modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

4D scene reconstruction

recurrent paradigm

visibility-based filtering