Veila: Panoramic LiDAR Generation from a Monocular RGB Image

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address weak controllability, insufficient spatial fine-grainedness, and challenging cross-modal alignment in monocular RGB-to-panoramic LiDAR generation, this paper proposes a conditional diffusion-based controllable generation framework. Our key contributions are: (1) a confidence-aware semantic-depth joint modulation mechanism enabling adaptive multi-cue fusion; (2) geometry-driven cross-modal alignment coupled with panoramic feature consistency constraints to ensure robust 3D structural and global semantic alignment; and (3) a novel cross-modal semantic-depth consistency metric for quantitative evaluation. The method achieves state-of-the-art generation quality on nuScenes, SemanticKITTI, and KITTI-Weather. Generated LiDAR data significantly improves downstream semantic segmentation performance. By enabling high-fidelity, controllable, and geometry-aware LiDAR synthesis from single-view RGB inputs, our approach establishes a new paradigm for low-cost, high-controllability multimodal simulation.

Technology Category

Application Category

📝 Abstract
Realistic and controllable panoramic LiDAR data generation is critical for scalable 3D perception in autonomous driving and robotics. Existing methods either perform unconditional generation with poor controllability or adopt text-guided synthesis, which lacks fine-grained spatial control. Leveraging a monocular RGB image as a spatial control signal offers a scalable and low-cost alternative, which remains an open problem. However, it faces three core challenges: (i) semantic and depth cues from RGB are vary spatially, complicating reliable conditioning generation; (ii) modality gaps between RGB appearance and LiDAR geometry amplify alignment errors under noisy diffusion; and (iii) maintaining structural coherence between monocular RGB and panoramic LiDAR is challenging, particularly in non-overlap regions between images and LiDAR. To address these challenges, we propose Veila, a novel conditional diffusion framework that integrates: a Confidence-Aware Conditioning Mechanism (CACM) that strengthens RGB conditioning by adaptively balancing semantic and depth cues according to their local reliability; a Geometric Cross-Modal Alignment (GCMA) for robust RGB-LiDAR alignment under noisy diffusion; and a Panoramic Feature Coherence (PFC) for enforcing global structural consistency across monocular RGB and panoramic LiDAR. Additionally, we introduce two metrics, Cross-Modal Semantic Consistency and Cross-Modal Depth Consistency, to evaluate alignment quality across modalities. Experiments on nuScenes, SemanticKITTI, and our proposed KITTI-Weather benchmark demonstrate that Veila achieves state-of-the-art generation fidelity and cross-modal consistency, while enabling generative data augmentation that improves downstream LiDAR semantic segmentation.
Problem

Research questions and friction points this paper is trying to address.

Generating panoramic LiDAR from monocular RGB images
Overcoming modality gaps between RGB and LiDAR
Ensuring structural coherence in panoramic LiDAR generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Confidence-Aware Conditioning Mechanism balances cues
Geometric Cross-Modal Alignment ensures RGB-LiDAR alignment
Panoramic Feature Coherence maintains structural consistency
🔎 Similar Papers
No similar papers found.
Youquan Liu
Youquan Liu
Fudan University
3D Scene Understanding
Lingdong Kong
Lingdong Kong
National University of Singapore
Computer VisionDeep Learning
Weidong Yang
Weidong Yang
Professor of Computer Science
Big Data
A
Ao Liang
National University of Singapore
J
Jianxiong Gao
Fudan University
Y
Yang Wu
Nanjing University of Science and Technology
X
Xiang Xu
Nanjing University of Aeronautics and Astronautics
X
Xin Li
Shanghai AI Laboratory
L
Linfeng Li
National University of Singapore
Runnan Chen
Runnan Chen
The University of Hong Kong, The University of Sydney
3D VisionMachine LearningMedical Image Analysis
B
Ben Fei
The Chinese University of Hong Kong