LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing challenges in 4D LiDAR data generation—including spherical geometric modeling difficulties, temporal sparsity of point clouds, and complex dynamic scenes—this paper introduces the first controllable generation framework specifically designed for 4D LiDAR sequences. Methodologically, it features: (1) a ray-centered Transformer architecture that models temporal feature evolution along individual sensor rays for the first time; (2) a hybrid cylindrical-spherical coordinate representation coupled with a 4D-aligned voxel layout to jointly encode geometric and motion structures; and (3) MaskSTART, a conditional generative framework enabling layout-guided compositional synthesis. Evaluated on reconstruction, forecasting, and conditional generation tasks, the method achieves new state-of-the-art performance: 76% reduction in MMD, 32% improvement in IoU, and 50% decrease in median L1 prediction error.

Technology Category

Application Category

📝 Abstract
Synthesizing high-fidelity and controllable 4D LiDAR data is crucial for creating scalable simulation environments for autonomous driving. This task is inherently challenging due to the sensor's unique spherical geometry, the temporal sparsity of point clouds, and the complexity of dynamic scenes. To address these challenges, we present LiSTAR, a novel generative world model that operates directly on the sensor's native geometry. LiSTAR introduces a Hybrid-Cylindrical-Spherical (HCS) representation to preserve data fidelity by mitigating quantization artifacts common in Cartesian grids. To capture complex dynamics from sparse temporal data, it utilizes a Spatio-Temporal Attention with Ray-Centric Transformer (START) that explicitly models feature evolution along individual sensor rays for robust temporal coherence. Furthermore, for controllable synthesis, we propose a novel 4D point cloud-aligned voxel layout for conditioning and a corresponding discrete Masked Generative START (MaskSTART) framework, which learns a compact, tokenized representation of the scene, enabling efficient, high-resolution, and layout-guided compositional generation. Comprehensive experiments validate LiSTAR's state-of-the-art performance across 4D LiDAR reconstruction, prediction, and conditional generation, with substantial quantitative gains: reducing generation MMD by a massive 76%, improving reconstruction IoU by 32%, and lowering prediction L1 Med by 50%. This level of performance provides a powerful new foundation for creating realistic and controllable autonomous systems simulations. Project link: https://ocean-luna.github.io/LiSTAR.gitub.io.
Problem

Research questions and friction points this paper is trying to address.

Synthesizing high-fidelity controllable 4D LiDAR data for autonomous driving simulations
Addressing spherical geometry and temporal sparsity challenges in LiDAR point clouds
Modeling complex dynamics from sparse temporal data with robust temporal coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid-Cylindrical-Spherical representation preserves LiDAR data fidelity
Ray-Centric Transformer models feature evolution along sensor rays
Masked Generative framework enables layout-guided compositional generation
🔎 Similar Papers
No similar papers found.
P
Pei Liu
The Hong Kong University of Science and Technology (Guangzhou)
S
Songtao Wang
Li Auto Inc.
L
Lang Zhang
Li Auto Inc.
X
Xingyue Peng
Li Auto Inc.
Y
Yuandong Lyu
Li Auto Inc.
J
Jiaxin Deng
Li Auto Inc.
S
Songxin Lu
Li Auto Inc.
W
Weiliang Ma
Li Auto Inc.
Xueyang Zhang
Xueyang Zhang
Li Auto Inc.
Autonomous DrivingWorld Model3D Vision
Y
Yifei Zhan
Li Auto Inc.
X
XianPeng Lang
Li Auto Inc.
J
Jun Ma
The Hong Kong University of Science and Technology