ArbiViewGen: Controllable Arbitrary Viewpoint Camera Data Generation for Autonomous Driving via Stable Diffusion Models

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Controllable multi-view image generation for autonomous driving is hindered by the scarcity of real-world images from extrapolated (novel) camera viewpoints. Method: We propose a diffusion-based approach built upon Stable Diffusion that requires no ground-truth supervision for extrapolated views. Our method introduces a hierarchical camera pose matching strategy, an enhanced feature-matching algorithm, a feature-aware adaptive view stitching mechanism, and a cross-view consistency self-supervised objective. Latent-space alignment is guided by clustering analysis, and geometric–photometric consistency is jointly optimized via self-supervised reconstruction loss. Contribution/Results: To our knowledge, this is the first method enabling high-fidelity, viewpoint-controllable synthesis of arbitrary virtual-camera images across diverse vehicle configurations. It significantly enhances data augmentation and simulation capabilities in complex driving scenarios, without relying on novel-view ground-truth annotations.

Technology Category

Application Category

📝 Abstract
Arbitrary viewpoint image generation holds significant potential for autonomous driving, yet remains a challenging task due to the lack of ground-truth data for extrapolated views, which hampers the training of high-fidelity generative models. In this work, we propose Arbiviewgen, a novel diffusion-based framework for the generation of controllable camera images from arbitrary points of view. To address the absence of ground-truth data in unseen views, we introduce two key components: Feature-Aware Adaptive View Stitching (FAVS) and Cross-View Consistency Self-Supervised Learning (CVC-SSL). FAVS employs a hierarchical matching strategy that first establishes coarse geometric correspondences using camera poses, then performs fine-grained alignment through improved feature matching algorithms, and identifies high-confidence matching regions via clustering analysis. Building upon this, CVC-SSL adopts a self-supervised training paradigm where the model reconstructs the original camera views from the synthesized stitched images using a diffusion model, enforcing cross-view consistency without requiring supervision from extrapolated data. Our framework requires only multi-camera images and their associated poses for training, eliminating the need for additional sensors or depth maps. To our knowledge, Arbiviewgen is the first method capable of controllable arbitrary view camera image generation in multiple vehicle configurations.
Problem

Research questions and friction points this paper is trying to address.

Generates arbitrary viewpoint images for autonomous driving
Addresses lack of ground-truth data for unseen views
Uses diffusion models for controllable view synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature-Aware adaptive view stitching algorithm
Cross-view consistency self-supervised learning
Diffusion-based arbitrary viewpoint image generation
Y
Yatong Lan
School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China
J
Jingfeng Chen
School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China
Yiru Wang
Yiru Wang
University of Pittsburgh
Econometrics
L
Lei He
School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China