🤖 AI Summary
Addressing the high cost of acquiring 360° street-view data and the lack of controllability in existing generative models for autonomous driving panoramic perception, this paper proposes Percep360—the first control-signal-driven panoramic street-scene generation method. Our approach models spatially continuous generation via diffusion processes and introduces two key innovations: (1) a local-scene diffusion mechanism to mitigate geometric and textural distortions inherent in pinhole imaging; and (2) a probabilistic prompting mechanism that dynamically fuses multi-source control signals (e.g., semantic maps, depth maps) to enhance cross-view consistency and conditional controllability. We evaluate Percep360 using both reference-free and reference-based image quality metrics (e.g., LPIPS), as well as downstream BEV segmentation performance. Experiments demonstrate that Percep360 outperforms conventional stitching-based baselines in perceptual fidelity and achieves significant mIoU gains in BEV segmentation, validating its effectiveness and utility for real-world perception tasks.
📝 Abstract
Panoramic perception holds significant potential for autonomous driving, enabling vehicles to acquire a comprehensive 360° surround view in a single shot. However, autonomous driving is a data-driven task. Complete panoramic data acquisition requires complex sampling systems and annotation pipelines, which are time-consuming and labor-intensive. Although existing street view generation models have demonstrated strong data regeneration capabilities, they can only learn from the fixed data distribution of existing datasets and cannot achieve high-quality, controllable panoramic generation. In this paper, we propose the first panoramic generation method Percep360 for autonomous driving. Percep360 enables coherent generation of panoramic data with control signals based on the stitched panoramic data. Percep360 focuses on two key aspects: coherence and controllability. Specifically, to overcome the inherent information loss caused by the pinhole sampling process, we propose the Local Scenes Diffusion Method (LSDM). LSDM reformulates the panorama generation as a spatially continuous diffusion process, bridging the gaps between different data distributions. Additionally, to achieve the controllable generation of panoramic images, we propose a Probabilistic Prompting Method (PPM). PPM dynamically selects the most relevant control cues, enabling controllable panoramic image generation. We evaluate the effectiveness of the generated images from three perspectives: image quality assessment (i.e., no-reference and with reference), controllability, and their utility in real-world Bird's Eye View (BEV) segmentation. Notably, the generated data consistently outperforms the original stitched images in no-reference quality metrics and enhances downstream perception models. The source code will be publicly available at https://github.com/Bryant-Teng/Percep360.