Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting

📅 2025-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the high cost of acquiring 360° street-view data and the lack of controllability in existing generative models for autonomous driving panoramic perception, this paper proposes Percep360—the first control-signal-driven panoramic street-scene generation method. Our approach models spatially continuous generation via diffusion processes and introduces two key innovations: (1) a local-scene diffusion mechanism to mitigate geometric and textural distortions inherent in pinhole imaging; and (2) a probabilistic prompting mechanism that dynamically fuses multi-source control signals (e.g., semantic maps, depth maps) to enhance cross-view consistency and conditional controllability. We evaluate Percep360 using both reference-free and reference-based image quality metrics (e.g., LPIPS), as well as downstream BEV segmentation performance. Experiments demonstrate that Percep360 outperforms conventional stitching-based baselines in perceptual fidelity and achieves significant mIoU gains in BEV segmentation, validating its effectiveness and utility for real-world perception tasks.

Technology Category

Application Category

📝 Abstract
Panoramic perception holds significant potential for autonomous driving, enabling vehicles to acquire a comprehensive 360° surround view in a single shot. However, autonomous driving is a data-driven task. Complete panoramic data acquisition requires complex sampling systems and annotation pipelines, which are time-consuming and labor-intensive. Although existing street view generation models have demonstrated strong data regeneration capabilities, they can only learn from the fixed data distribution of existing datasets and cannot achieve high-quality, controllable panoramic generation. In this paper, we propose the first panoramic generation method Percep360 for autonomous driving. Percep360 enables coherent generation of panoramic data with control signals based on the stitched panoramic data. Percep360 focuses on two key aspects: coherence and controllability. Specifically, to overcome the inherent information loss caused by the pinhole sampling process, we propose the Local Scenes Diffusion Method (LSDM). LSDM reformulates the panorama generation as a spatially continuous diffusion process, bridging the gaps between different data distributions. Additionally, to achieve the controllable generation of panoramic images, we propose a Probabilistic Prompting Method (PPM). PPM dynamically selects the most relevant control cues, enabling controllable panoramic image generation. We evaluate the effectiveness of the generated images from three perspectives: image quality assessment (i.e., no-reference and with reference), controllability, and their utility in real-world Bird's Eye View (BEV) segmentation. Notably, the generated data consistently outperforms the original stitched images in no-reference quality metrics and enhances downstream perception models. The source code will be publicly available at https://github.com/Bryant-Teng/Percep360.
Problem

Research questions and friction points this paper is trying to address.

Generates coherent 360° panoramic views for autonomous driving
Overcomes data loss from pinhole sampling with diffusion method
Enables controllable panoramic generation via probabilistic prompting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local Scenes Diffusion Method for panorama generation
Probabilistic Prompting Method for controllability
Spatially continuous diffusion process for coherence
🔎 Similar Papers
No similar papers found.
Fei Teng
Fei Teng
Reader in Intelligent Energy Systems, Imperial College London
Stability-constrained OptimisationCyber-resilient System OperationData Privacy and Trading
K
Kai Luo
School of Robotics and the National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha 410082, China
S
Sheng Wu
School of Robotics and the National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha 410082, China
Siyu Li
Siyu Li
University of Illinois at Chicago
RoboticsMicro-robot swarmsHuman-robot InteractionControl and Motion Planning
P
Pujun Guo
School of Robotics and the National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha 410082, China
J
Jiale Wei
Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany
Kunyu Peng
Kunyu Peng
Karlsruhe Institute of Technology
video understandingopen set recognitiongeneralizable deep learning
J
Jiaming Zhang
Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany; Department of Computer Science, ETH Zürich, CH-8092 Zürich, Switzerland
Kailun Yang
Kailun Yang
Professor. School of Artificial Intelligence and Robotics, Hunan University (HNU); KIT; UAH; ZJU
Computer VisionComputational OpticsIntelligent VehiclesAutonomous DrivingRobotics