ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods suffer from two key limitations: repair-based paradigms struggle to recover complex visual artifacts, while LiDAR-guided approaches—due to sparse and incomplete point clouds—yield coarse-grained camera control and lack geometric guidance. This paper introduces the first purely vision-based, camera-controllable trajectory video generation framework, leveraging dense, scene-complete 3D Gaussian Splatting (3DGS) as a geometric prior to enable high-fidelity driving video synthesis under arbitrary camera trajectories. We propose a novel two-stage training paradigm and a cross-trajectory 3DGS data construction strategy, enabling large-scale distillation of multi-trajectory supervision signals directly from monocular videos—the first such approach—and introduce the ParaDrive multi-trajectory dataset. Experiments demonstrate state-of-the-art performance in precise camera pose control and structural consistency, significantly improving geometric fidelity and viewpoint controllability of generated videos.

Technology Category

Application Category

📝 Abstract
We propose ReCamDriving, a purely vision-based, camera-controlled novel-trajectory video generation framework. While repair-based methods fail to restore complex artifacts and LiDAR-based approaches rely on sparse and incomplete cues, ReCamDriving leverages dense and scene-complete 3DGS renderings for explicit geometric guidance, achieving precise camera-controllable generation. To mitigate overfitting to restoration behaviors when conditioned on 3DGS renderings, ReCamDriving adopts a two-stage training paradigm: the first stage uses camera poses for coarse control, while the second stage incorporates 3DGS renderings for fine-grained viewpoint and geometric guidance. Furthermore, we present a 3DGS-based cross-trajectory data curation strategy to eliminate the train-test gap in camera transformation patterns, enabling scalable multi-trajectory supervision from monocular videos. Based on this strategy, we construct the ParaDrive dataset, containing over 110K parallel-trajectory video pairs. Extensive experiments demonstrate that ReCamDriving achieves state-of-the-art camera controllability and structural consistency.
Problem

Research questions and friction points this paper is trying to address.

Generates novel trajectory videos using only camera control
Overcomes limitations of repair-based and LiDAR-dependent methods
Ensures precise camera control and structural consistency in generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses dense 3DGS renderings for geometric guidance
Employs two-stage training with coarse and fine control
Introduces cross-trajectory data curation for scalable supervision
🔎 Similar Papers
2024-04-03IEEE International Conference on Robotics and AutomationCitations: 39
Y
Yaokun Li
Sun Yat-sen University
S
Shuaixian Wang
Sun Yat-sen University, Shenzhen Polytechnic University
M
Mantang Guo
ZYT
Jiehui Huang
Jiehui Huang
Sun Yat-sen University
Machine LearningComputer VisionEmbodied AIMaterials Science
T
Taojun Ding
ZYT
M
Mu Hu
The Hong Kong University of Science and Technology
K
Kaixuan Wang
ZYT
Shaojie Shen
Shaojie Shen
Associate Professor, Hong Kong University of Science and Technology
Robotics
Guang Tan
Guang Tan
School of Intelligent Systems Engineering, Sun Yat-sen Unversity
Machine LearningMobile ComputingNetworking