DriveFix: Spatio-Temporally Coherent Driving Scene Restoration

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for 4D driving scene reconstruction often suffer from spatial misalignment across cameras and temporal drift due to insufficient spatiotemporal consistency. This work proposes DriveFix, a novel framework that, for the first time, explicitly integrates spatiotemporal consistency constraints into a diffusion model. DriveFix employs an interleaved diffusion Transformer to jointly model cross-view spatial alignment and temporal dependencies, complemented by a geometry-aware loss and history-conditioned generation to ensure outputs adhere to a coherent 3D structure. Evaluated on Waymo, nuScenes, and PandaSet, the method achieves state-of-the-art performance in both scene reconstruction and novel view synthesis, significantly reducing visual artifacts and enhancing texture stability, thereby advancing high-fidelity 4D world modeling toward real-world deployment.

Technology Category

Application Category

📝 Abstract
Recent advancements in 4D scene reconstruction, particularly those leveraging diffusion priors, have shown promise for novel view synthesis in autonomous driving. However, these methods often process frames independently or in a view-by-view manner, leading to a critical lack of spatio-temporal synergy. This results in spatial misalignment across cameras and temporal drift in sequences. We propose DriveFix, a novel multi-view restoration framework that ensures spatio-temporal coherence for driving scenes. Our approach employs an interleaved diffusion transformer architecture with specialized blocks to explicitly model both temporal dependencies and cross-camera spatial consistency. By conditioning the generation on historical context and integrating geometry-aware training losses, DriveFix enforces that the restored views adhere to a unified 3D geometry. This enables the consistent propagation of high-fidelity textures and significantly reduces artifacts. Extensive evaluations on the Waymo, nuScenes, and PandaSet datasets demonstrate that DriveFix achieves state-of-the-art performance in both reconstruction and novel view synthesis, marking a substantial step toward robust 4D world modeling for real-world deployment.
Problem

Research questions and friction points this paper is trying to address.

spatio-temporal coherence
4D scene reconstruction
autonomous driving
multi-view synthesis
temporal drift
Innovation

Methods, ideas, or system contributions that make the work stand out.

spatio-temporal coherence
diffusion transformer
multi-view restoration
4D scene reconstruction
geometry-aware training
🔎 Similar Papers
No similar papers found.
H
Heyu Si
Zhejiang University
B
Brandon James Denis
Huawei
M
Muyang Sun
Huawei
Dragos Datcu
Dragos Datcu
Huawei
Computer VisionAugmented RealityRemote CollaborationAffective Computing
Yaoru Li
Yaoru Li
Zhejiang University, Huawei Technologies
LLM Agents
Xin Jin
Xin Jin
Samsung Electronics (China) R&D Center
computer visionmachine learningpattern recognition
R
Ruiju Fu
Huawei
Y
Yuliia Tatarinova
Huawei
Federico Landi
Federico Landi
Huawei Technologies
Computer VisionDeep Learning
J
Jie Song
Zhejiang University
M
Mingli Song
Zhejiang University
Q
Qi Guo
Huawei