Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Single-view autonomous driving perception suffers from occlusions and poor detection performance for distant objects. To address this, we propose TYP—the first controllable multi-view 3D perception generation framework conditioned on real ego-vehicle sensor data and jointly driven by simulation-cooperative data. TYP employs a conditional diffusion model to enforce cross-view semantic consistency and geometric plausibility, incorporating explicit multi-view geometric constraints, simulation-to-real co-training, and 3D layout-guided sampling. The method efficiently transforms any single-view driving dataset into a high-fidelity, semantically and spatially consistent virtual cooperative perception dataset. This synthesized data significantly improves pretraining performance of early- and late-fusion cooperative perception algorithms, enabling robust downstream connected and autonomous vehicle (CAV) task development with minimal or zero real-world cooperative annotations.

Technology Category

Application Category

📝 Abstract

Self-driving cars relying solely on ego-centric perception face limitations in sensing, often failing to detect occluded, faraway objects. Collaborative autonomous driving (CAV) seems like a promising direction, but collecting data for development is non-trivial. It requires placing multiple sensor-equipped agents in a real-world driving scene, simultaneously! As such, existing datasets are limited in locations and agents. We introduce a novel surrogate to the rescue, which is to generate realistic perception from different viewpoints in a driving scene, conditioned on a real-world sample - the ego-car's sensory data. This surrogate has huge potential: it could potentially turn any ego-car dataset into a collaborative driving one to scale up the development of CAV. We present the very first solution, using a combination of simulated collaborative data and real ego-car data. Our method, Transfer Your Perspective (TYP), learns a conditioned diffusion model whose output samples are not only realistic but also consistent in both semantics and layouts with the given ego-car data. Empirical results demonstrate TYP's effectiveness in aiding in a CAV setting. In particular, TYP enables us to (pre-)train collaborative perception algorithms like early and late fusion with little or no real-world collaborative data, greatly facilitating downstream CAV applications.

Problem

Research questions and friction points this paper is trying to address.

Generate 3D views from ego-car data.

Transform single-view datasets for collaborative driving.

Enhance CAV perception using simulated and real data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates realistic multi-view perception

Combines simulated and real ego-car data

Uses conditioned diffusion model for consistency

🔎 Similar Papers

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes