FRUC: Feedforward Dynamic Scene Reconstruction from Uncalibrated Collaborative Driving Views

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge of calibration-free dynamic 3D reconstruction in multi-vehicle cooperative driving scenarios by proposing an efficient, single-pass feedforward method that requires no explicit calibration. The approach models the multi-vehicle system as a spatiotemporally unstructured ego-centric multi-camera network and introduces a novel ego-centric causal occlusion field to encode prior knowledge of dynamic occlusion evolution. A zero-initialized residual denoising mechanism enables non-destructive fusion of cross-vehicle geometry and completion of occluded regions. Integrating a vision-guided geometric Transformer backbone with 3D Gaussian splatting representation, the method achieves state-of-the-art performance on both the V2XReal and UrbanIng-V2X datasets, significantly outperforming existing approaches in terms of rendering fidelity and reconstruction efficiency.

📝 Abstract

We present FRUC, a feed-forward 3D Gaussian splatting framework for dynamic scene reconstruction from uncalibrated collaborative driving views. Existing multi-agent reconstruction frameworks are often hindered by rigid prerequisites, demanding precise spatial calibration and slow per-scene optimization. In this paper, we rethink this task by conceptualizing a distributed multi-vehicle network as a spatio-temporally unstructured ego-centric multi-camera system, where the core challenge lies in enhancing ego-centric occluded geometry through collaboration without degrading the ego's accurately observed visible geometry, while preserving reconstruction efficiency. For efficient reconstruction, FRUC is built upon a visual grounded geometric Transformer backbone to enable one-shot, calibration-free inference from a flexible number of multi-vehicle views. To achieve non-destructive geometric supplementation under uncalibrated cross-agent misalignment, FRUC first introduces an ego-centric causal occlusion field that explicitly derives occlusion evolution as latent priors by modeling agent-wise spatio-temporal correlations. Guided by these occlusion priors, it further formulates cross-agent integration as a deterministic residual denoising process via zero-initialized injection, turning challenging cross-agent fusion into bounded residual learning for robust collaborative blind-spot completion. Through extensive evaluations on the real-world V2XReal and UrbanIng-V2X datasets, FRUC is shown to be a new state-of-the-art for the scene reconstruction of dynamic collaborative driving environments, significantly outperforming existing methods in both rendering quality and efficiency.

Problem

Research questions and friction points this paper is trying to address.

dynamic scene reconstruction

uncalibrated collaborative views

occlusion completion

multi-agent perception

3D Gaussian splatting

Innovation

Methods, ideas, or system contributions that make the work stand out.

feedforward reconstruction

uncalibrated multi-view

3D Gaussian splatting