CRUISE: Cooperative Reconstruction and Editing in V2X Scenarios using Gaussian Splatting

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of low-fidelity environment reconstruction/editing and unrealistic, uncontrollable data augmentation in V2X scenarios, this paper proposes the first high-fidelity, editable 3D reconstruction framework tailored for vehicle-infrastructure cooperation. Methodologically, it introduces (1) a decomposition-based Gaussian point lattice representation enabling structured, editable 3D modeling of dynamic traffic agents; (2) the first integration of differentiable rendering into V2X data augmentation, supporting multi-view joint synthesis and consistency optimization across vehicle-mounted and roadside sensors; and (3) a V2X collaborative simulation augmentation pipeline unifying real-world reconstruction with controllable virtual editing. Evaluated on the V2X-Seq benchmark, our method significantly improves 3D detection and tracking performance across vehicle, infrastructure, and cooperative views, while efficiently generating diverse, high-difficulty edge cases—establishing a new paradigm for autonomous driving model training and robustness evaluation.

Technology Category

Application Category

📝 Abstract
Vehicle-to-everything (V2X) communication plays a crucial role in autonomous driving, enabling cooperation between vehicles and infrastructure. While simulation has significantly contributed to various autonomous driving tasks, its potential for data generation and augmentation in V2X scenarios remains underexplored. In this paper, we introduce CRUISE, a comprehensive reconstruction-and-synthesis framework designed for V2X driving environments. CRUISE employs decomposed Gaussian Splatting to accurately reconstruct real-world scenes while supporting flexible editing. By decomposing dynamic traffic participants into editable Gaussian representations, CRUISE allows for seamless modification and augmentation of driving scenes. Furthermore, the framework renders images from both ego-vehicle and infrastructure views, enabling large-scale V2X dataset augmentation for training and evaluation. Our experimental results demonstrate that: 1) CRUISE reconstructs real-world V2X driving scenes with high fidelity; 2) using CRUISE improves 3D detection across ego-vehicle, infrastructure, and cooperative views, as well as cooperative 3D tracking on the V2X-Seq benchmark; and 3) CRUISE effectively generates challenging corner cases.
Problem

Research questions and friction points this paper is trying to address.

Reconstructs V2X scenes using Gaussian Splatting for accuracy
Enables flexible editing of dynamic traffic participants
Augments V2X datasets for training and evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposed Gaussian Splatting for scene reconstruction
Editable Gaussian representations for dynamic objects
Multi-view rendering for V2X dataset augmentation
H
Haoran Xu
Institute for AI Industry Research (AIR), Tsinghua University; Beijing Institute of Technology; Baidu Inc
Saining Zhang
Saining Zhang
College of Computing and Data Science, Nanyang Technological University
Computer Vision
P
Peishuo Li
Nanyang Technological University
Baijun Ye
Baijun Ye
Tsinghua University
Computer VisionEmbodied AI
X
Xiaoxue Chen
Institute for AI Industry Research (AIR), Tsinghua University
Huan-ang Gao
Huan-ang Gao
Ph.D. student, Tsinghua University
AgentVision & Robotics
J
Jv Zheng
Institute for AI Industry Research (AIR), Tsinghua University
X
Xiaowei Song
Institute for AI Industry Research (AIR), Tsinghua University
Ziqiao Peng
Ziqiao Peng
Renmin University of China
3D Face AnimationTalking Head Generation
R
Run Miao
Beijing University of Technology
J
Jinrang Jia
Baidu Inc
Y
Yifeng Shi
Baidu Inc
G
Guangqi Yi
Baidu Inc
H
Hang Zhao
Tsinghua University
H
Hao Tang
Peking University
H
Hongyang Li
Shanghai AI Lab
Kaicheng Yu
Kaicheng Yu
Assistant Professor, Westlake University, PI of Autonomous Intelligence Lab
computer vision3D understandingautonomous perceptionautomatic machine learning
H
Hao Zhao
Institute for AI Industry Research (AIR), Tsinghua University; Beijing Academy of Artificial Intelligence